Christian’s previous post talks about altering user behavior by changing chargeback models. I would like to thank Christian for his efforts to raise awareness about this new approach to charging for data center services. I believe that implementing chargeback models based on power usage will encourage customers to consider power efficiency more seriously and reduce our overall impact on the environment.
I would like to illustrate this point by using the analogy of getting a first car for a teenager. By the way, if you currently have one of these creatures at home, then you have my commiserations.
Let’s say you are investigating the conditions under which you will allow your teenager to have a car. Consider the following two options:
For each option, what sort of car do you think your teenager will want?
With option 1, you may find that they come back with a clapped out 351 cubic inch monster with the thirst of a Jentil, maybe something like the “striped tomato” from Starsky and Hutch. And who cares? You’re paying for the gas, right?
But there’s no point in having the coolest car in the neighborhood if you can’t even afford to take a date to a drive-in movie. If you select option 2, you may find your teenager develops a more healthy interest in how many miles per gallon (MPG) something more sensible can manage (like the new Prius hybrids at the Redmond Campus), rather than whether they have enough torque to leave tire slicks longer than a 747 landing at Princess Juliana International airport.
I’ve kept this analogy simple to reinforce the message, as driving user behavior through chargeback models in data centers involves more factors than equipping a teenager with a suitable set of wheels. With cars, you have the main mileage-related input from the cost of gas. The service you receive is the number of miles that a set amount of gas will take you. (I’m assuming typical Seattle traffic here, so the fact that you could run a more powerful car with the potential to go faster isn’t an factor, as you’ll still be sat in the same queue over the floating bridge.) If you increase the MPG, you get more output for the same input.
With servers, the situation is more complex; the main issue that we struggle with is the fact that there is no direct equivalent to miles per gallon. How do you measure application output? It isn’t just about CPU utilization. What about an application that makes repeated calls to hard disk but doesn’t use much in the way of processor resources? If you fit more memory to the server, it can may be able to cache the disk access calls, but you’re then consuming more power in the additional memory module.
All the standard performance monitoring areas, such as processor, memory, hard disk, network, and cache make varying contributions to power consumption. What we needed at Microsoft was a chargeback model that is easy to understand, straightforward to administer, and allocates data center costs to customers proportionately.
For the Microsoft data centers, the effort to change our chargeback model was not a simple conversion, as it took us one and a half years to move from our previous model of charging for floor space based upon rack utilization to the new model. Not surprisingly, it was not the tooling or process modifications that posed the biggest hurdle but the cultural and political changes that were required. Even today, I frequently have to remind customers that ‘DC space is power.’
For the implementation, we reviewed several methodologies for the chargeback model, ranging in complexity and ease of application. We rapidly discovered that power monitoring at the server is too expensive and complex, even with the newer server motherboards that enable you to measure power usage directly.
You may be wondering as to the identity of our customers. Our customers include all the external Internet services that Microsoft provides, many of which you may already use. These services include:
Internal services, such as Outlook Web Access, mobile e-mail, and SharePoint publishing, are supported by the regional Microsoft IT teams.
The final model that we now use has two basic components:
I’m not using real figures here, but you should be able to see the basis of how we implemented charging based on power consumption.
The first figure we can work from is the total power consumption of a data center. That’s fairly easy to discover, as our energy suppliers have a strong vested interest in charging us accurately for the power we use.
Where possible, we select power tariffs that include a proportion of energy generated from renewable resources. The proportion and type of renewable energy varies according to geography. For example, data centers in mountainous regions such as Canada can draw on hydroelectric resources but tend not to do so well with solar energy. Some European countries with a history of windmills (think mice in clogs) are generating significant amounts of electricity from offshore wind farms. Middle-eastern locations look to solar power as a stable and increasingly cheap power source.
We start by calculating Power Usage Effectiveness (PUE).
PUE = Total Utility Load Total IT Equipment Power
Note that for PUE, a lower figure is better. There is also the reciprocal of PUE, Data Center infrastructure Efficiency (DCiE), which is expressed as a percentage and defined as:
DCiE = 1 = Total IT Equipment Power x 100 PUE Total Utility Load
At 100% utilization, a typical data center consumes 10 megawatts (MW). A more typical utilization figure is 70%, or 7 MW, of which 3 to 4 MW is IT equipment power. These figures give a PUE rating of between 1.75 and 2.33.
The difference between the Total Utility Load and the Total IT Equipment Power comes from items such as losses in uninterruptable power supplies (between 3 and 7%), lighting, cooling, air conditioning, and fire alarms. We should also not forget the most essential item in any data center: the coffee machine. Obviously, we have to ensure that our charge out rate covers these additional overhead costs.
To determine the amount of floor space that each customer is using, we rate each model of device currently in our data centers. Not surprisingly, our customers frequently ask, ‘Why are you billing me based on the manufacturer ratings, which are higher than the actual power we use?’ The answer is that we do not utilize the manufacturer ratings directly but carry out our own extensive testing at different utilization levels. This testing enables us to rate each device according to its average expected utilization when online and under load.
An interesting effect of moving our chargeback model driver from racks to kW was that many customers with older servers saw their bills go down. This change occurred because newer servers use more power per rack than older models and are typically less power efficient. However, I should emphasize that this lower efficiency energy usage pattern is because server manufacturers currently concentrate on obtaining higher compute densities, rather than maximizing power efficiency. I would expect newer generations of servers that have been designed for minimal power usage to be significantly better than both the current generation and any older server models.
Examples of power efficiencies include using higher capacity memory modules, installing lower Total Design Power (TDP) processors, and powering down redundant components, such as network interface cards and power supplies. I will be discussing these areas in greater depth in later posts.
From our device rating, we then apply a chargeback model similar to the following one:
If you are a customer with a rack of ten devices rated at 300W, then we would charge you monthly as follows:
Total per month: $519
By the way, if you are one of our customers, I did say that these aren’t the real figures that we use. So, no, please don’t contact me for a refund!
The simplest way to describe how we determine rates is to say that we divide the total costs by utilization. In the case of Floor Space billing, we look at all the operational costs of our facilities, such as lease, depreciation, electrical and mechanical equipment maintenance and support, and so on and compare these costs to the number of kW we expect to use. This approach includes our overhead costs in the chargeback model.
We then need to include the cost of electricity to the data center. This leads to another question that customers regularly ask, which is “Why isn’t my rate for electricity the same as our power tariffs from the supplier?” For example, the power rates published by Grant County Public Utility District are published at http://www.gcpud.org/customerService/billing/ratesFees.htm. For our Quincy, Washington site, the rate is $.022 per kWh. To answer this question, we explain the PUE associated with each site, and how this factors into the billing rate.
Imagine the same customer with 3kW of servers installed. In one month, using an average of 730 hrs per month, those servers will consume 2,190 kWh. Using a PUE of 2.0 (not our actual PUE), results in a total utility consumption of 4,380kWh, which, at the rate of $.022 per kWh, creates a monthly cost to Microsoft for this customer of $96.36. Incorporating the PUE ratio of 2.0 means that the effective rate that we charge the customer for the 3kW they use is $.044 per kWh.
You can easily see that lowering the PUE immediately reduces the charging rates on customers. This correlation of energy efficiency to the size of their bills enables us to interest customers in the possibility of reducing the energy overhead at the site, for example, by replacing servers with more power efficient designs. Eventually, this new charging model will lead to long-term benefits, not just in terms of costs but also in terms of overall environmental impact.
Going back to our original example, if you can replace each of those ten devices with servers that provide similar application throughput but only use 120W, then your total monthly bill will shrink to $208, a saving of $311 per month or $3,732 each year. If those new servers cost $1,000 each, your expenditure is paid for in just over two and half years. If power costs go up, then this payback time will shorten considerably.
Changing our chargeback model to one that uses power as the basis for floor space makes sense, both for us and for our customers. As older equipment is retired and replaced, we expect to see greater emphasis on power efficiency rather than raw output. Reducing power consumption on individual servers results in a reduction in the total power consumption for the data center, helping to conserve our power bandwidth and minimize our impact on the environment.
Calling all IT Professionals!
We’re looking to set up an exciting competition for anyone who wants to try their hand at building the most power efficient server possible. The details aren’t finalized yet, but expect a blog posting soon about the terms and conditions, and most importantly, the prizes!