It’s key to first understand why power management is important. Data centers are notoriously inefficient when operating at low loads, which is how most operate—at about 6-12% of full load. (Glanz, 2012) Unfortunately, idle servers use more than half the energy of fully active servers (Meisner, 2009). There are two areas where the energy use of a data center can be improved. One is the IT equipment and another area is data center infrastructure (cooling and power distribution systems). Adaptive power management focuses on IT equipment, but reducing server energy use inherently reduces the energy use of infrastructure equipment.
There are some widespread misconceptions about server energy use. With some equipment, energy use is roughly proportional to useful work; the more energy used, the more heat, light, or pumping is performed. However, servers typically use more than half of their maximum energy use while performing little or no useful work (Pflueger, 2010 Pg 7) (WSU, 2013). Newer servers are being developed that only use 35% of peak power while idling (Bhattacharya, 2012). While utilization rates can be improved with virtualization and other data management strategies, there are challenges to this. One is the need to accommodate customer’s occasional higher needs for computer power. Another is critical data that isn’t easily virtualized (Meisner, 2009 Pg 1). However, of the new data being produced in the world, most is accessed as read-only or as new data rather than updates, and this makes it a good target for power management because replicas can be powered down without adversely impacting data access (Ganesh, 2013 Pg 5).
Power management, though, can be tricky. Typical idle periods, although frequent, last seconds or less, and thus require more complex management systems (Meisner, 2009 Pg 1).
Saving energy is the primary but not the only benefit of power management. Half of the respondents in the Roadmap survey discussed below anticipated capacity constraints in one of their data centers within the next 24 months, and worried about a lack of power and/or cooling capacity in the face of explosive load growth. Power management can help reduce the need for power and cooling capacity (Pflueger, 2010 Pg 17).
There are two primary approaches to address the energy waste of underutilized servers. One is to utilize virtualization to gather useful complementary workloads and compile them onto fewer servers so that the servers have a much higher utilization rate (WSU, 2013). Another is to utilize power management to identify idle servers and power them down or at least put them into hibernation (Pflueger, 2010 Pg 7). They can both effectively generate significant energy savings and can both be implemented, but it should be noted that their savings potentials are not additive; implementing one reduces the savings potential of the other.
It is also important to understand what is actually meant by power management. “Power management” is not one but a collection of technologies and strategies that combine to reduce the energy use of idle servers. Which features are available in a data center is partially a function of the age of the server; the newer the servers, the more power management options can be implemented. Furthermore, the features can be applied at the component level, the system level, the rack level, and/or the data center level, each interacting with other levels. Different power management features have different potential impacts on server performance, so it’s important to understand these and select products that minimize the adverse impact on the performance criteria of greatest importance to a data center manager and the end use customers. For more on this, see the “End User Drawbacks” section of this document.
The following are examples of power management technologies and strategies.
Server Workload Consolidation
Computing workloads can be consolidated into a fewer number of active servers first through periodic virtualization and then through “load localization,” which is a dynamic consolidation of workloads among remaining servers. Using load localization, data center operators can maximize the number of idle servers and the duration that they are idle, thereby optimizing the potential energy savings from powering down idle servers and their associated infrastructure. For purposes of power management, if the servers within a “power cycle unit” (PCU) can be powered down along with their associated cooling and power distribution equipment, energy savings can be maximized. This can be accomplished by sharing data across PCUs for data redundancy (the industry standard is three replicas of data) so one PCU can be powered down without impacting the availability of that data (Ganesh, 2013 Pg 3-4) (WSU, 2013).
Since all the components in all the servers in a data center are never going to be fully active simultaneously, data center designers can size the provisioning of power and cooling to the actual observed peak energy use, which is likely to be much less than the server name plate ratings. This has the benefits of savings on provisioning equipment costs as well as allowing provisioning equipment to operate at a higher level with a greater efficiency. However, measures need to be taken to ensure that changes in IT equipment and software don’t result in their exceeding the need for power and cooling, possibly resulting in equipment failures. See Power Capping below for more information on this (Bhattacharya, 2012).
The ability to modify sleep states is provided with most modern servers. Sleep states greatly reduce power requirements of a computer while it is idle. While sleep states are quite common among mobile devices, laptops and desktop computers, they are rarely used in current servers due to a perception of unacceptably long restart delays. Also, unlike consumer devices, servers don’t have a user to determine when it’s time to wake up (Meisner, 2009 Pg 3-4) (Pflueger, 2010 Pg 7).
Power capping is primarily used to protect racks of servers from damaging power spikes by limiting the amount of power that can be supplied to that equipment. Most new servers have power capping mechanisms and most system management software is equipped to take advantage of power capping. With automated granular power and temperature monitoring, dynamic power capping adjusts the cap to match workload and modifies CPU frequencies on the fly. However, not all equipment is compatible with power capping. Some legacy hardware may be unable to respond to a power cap (Klaus, 2013). Power capping is also used in data centers where power and cooling capacity is sized to actual observed peak power rather than equipment nameplate ratings to avoid spikes in software activity that exceed the infrastructure capacity provided and cause system failures. For this usage of power capping, it needs to have enough speed and stability to handle the dynamic changes in power needs (Bhattacharya, 2012).
This strategy addresses the fact that server idle periods may only last a few seconds or less. It transitions an entire blade server system between a high-performance active state and near-zero power idle state (6% of peak power) in response to instantaneous load. Special system components signal the beginning and completion of computing work.
Redundant Array for Inexpensive Load Sharing (RAILS) uses a power supply that is more efficient across the range of power needed, raising average efficiency from 68 to 86%. The Northwest Energy Efficiency Alliance’s 80 PLUS program achieved 70% market penetration of desktop computers by 2012 and hopefully impacted server power supplies as well, although 80 PLUS only requires efficiency thresholds for typical loads and not all the loads used with some power management approaches such as Power Nap.
There are efforts underway to make server energy use more proportional to work output. Processor dynamic voltage and frequency scaling (DVFS -- also known as processor throttling) provides energy savings under reduced loads that can be up to the cube of the proportional speeds. While a useful contribution to overall energy savings in modern servers, processors only account for about a quarter of the total server energy load. Efforts are underway to expand energy proportional computing from DVFS to the entire server, although some components have inherent fixed losses so there are limits to this expansion (Meisner, 2009 Pg 1-4, 8-10).
Barriers to Implementation
Finally, it’s important to comprehend how data center professionals understand power management and why it isn't implemented more. The Roadmap (TGG) interviewed 20 data center professionals from 19 organizations about their understanding and implementation of server power management systems. Respondents included end users, product vendors, and consultants. The survey identified several potential obstacles to more effective and widespread implementation of power management features (Pflueger, 2010 Pg 4-6). While 20 is a small set of survey respondents to fully understand a national industry, the total number of servers under management of the participants’ organizations was significant, ranging from 250 to over 100,000 and totaling around 500,000 servers not counting related storage and networking equipment. Additionally, these server populations ranged from almost completely homogeneous to heterogeneous (from a wide range of manufacturers) (PfluegerMeisner, 2010 Pg 15).
The first obstacle is a lack of understanding. Many respondents misunderstood the technical details of power management and how to implement it as well as how to account for the benefits. Only a third of respondents had implemented any form of power management, and all but one of them felt they lacked the tools to quantify the energy savings impacts.
Another obstacle is the “split incentives” whereby data center managers aren’t supported and incentivized to cut energy use and so don’t benefit from doing so. Support would include training data center staff to research and explore options, to invest in implementation and track savings, and then to receive some benefit from that accomplishment. This can be true even if there is a facility manager or energy manager with an interest in reducing server energy use but no authority. It’s not uncommon for data center managers to not even have access to utility bills. Data center managers also certainly don’t get any incentives from customers to implement power management features. Thus most managers understandably focus largely on their service level agreements (SLAs) with customers that highlight data reliability and performance.
Some respondents were concerned about the impact of power management on server availability and performance. One mentioned that vendors typically shipped equipment with the power management features disabled, and took this as a reflection of a general lack of trust of power management features to not adversely impact system reliability and performance.
Finally, some identified the challenges of assessing power use of servers with so much variety in server and application types. They felt the need for one tool to handle this, rather than having to learn to use different power management systems for a variety of equipment types. For example, while most servers have sleep/hibernation states, there are a number of ranges of sleep states that may be used in a server, including G, S, C, D, P, CC, and LL states.
What seems to be needed to spur implementation of power management is education and outreach as well as the development of methods to more easily collect power and energy data and control power management with a variety of equipment. Next, quantify the benefits in terms of reductions in energy use as well as capital expenditures. It’s also important for data center managers to negotiate SLAs (service level agreements) with customers to not preclude the implementation of reasonable power management features. To encourage behavioral intention to adopt any new technology requires developing a perception of usefulness and ease of use. While both critical, the latter trumps the former in importance. Strategies for encouragement need to make implementation and documentation as easy as possible (Pflueger, 2010 Pg 4-6).