Note: This is the temporary home of the “No Risk HPM” draft, please link to via.vmw.com/no-risk-hpm instead.
“Power Management never worked for us”
I’ve talked to a long list of customers who tried some form of power management but rolled back because of unacceptable performance impact for some, usually critical, applications. In every single one of these discussions and subsequent tests, we managed to arrive at a configuration that fulfilled the main criteria: no measurable performance impact when running with the ESXi Balanced policy and guaranteed BIOS Maximum Performance when setting ESXi High Performance (in the unlikely event it should become necessary).
The majority of those earlier failures were methodology related, for example starting from a dynamic / default BIOS power profile and then just enabling P and C-States for “OS Control” while retaining the more power efficient but performance impacting defaults of the other options in the profile. Some were due to configured energy capping or PSU redundancy issues / policies and a couple were caused by old BIOS/UEFI versions with known issues that have long been fixed.
If you are in the “we tried it and it didn’t work” camp, I encourage you to give it another go on an up-to-date BIOS and following the below configuration guidelines.
OS Control
The recommendation to configure the BIOS to “OS Control” and let ESXi use its own Power Policies is widely publicized and correct in principle but sadly not granular enough. For one, the terminology varies widely between hardware vendors and sometimes even in between policy names depending on the interface (BIOS vs. LOM). It could refer to a power policy, it might also be the value of individual options, like P-States or the Intel Energy Performance Bias.
C1E
Enhanced C1 is using CPU model specific heuristics to possibly idle deeper than C1. This is completely transparent to the guest OS and while it might save some power, esp. when the OS is not configured to using deep C-States itself, it also makes the C1 wake-up latency longer and non-deterministic. Hence why this should be disabled to ensure even the most latency sensitive workloads are not impacted by the base configuration.
Different vendors, different options
This is not aiming to be a complete list, the process for other vendors is pretty much the same (with some exceptions covered below).
Dell
Under System BIOS Settings – System Profile Settings, everything should look like the “Maximum Performance” System Profile (but now set to “Custom”) except for:
- CPU Power Management should be set to “OS DBPM”
- C States should be set to “Enabled”
That’s it!
HPE
Under BIOS/Platform Configuration – Power Management, the main and Advanced Power Options should look like as if the Power Profile is set to “Maximum Performance” (but now set to “Custom”) except for:
- Power Regulator should be set to “OS Control” mode
- Minimum Processor Idle C-State should be set to “C6 State”
- Minimum Processor Idle Package C-State should be set to “Package C6 (retention) State”
Because there is no separate option for C1E, i.e. it is inclusive when configuring C6, you will have to disable it via ESXi boot option (disableC1E) if you want 100% control using ESXi Power Policies.
Lenovo
Under Operating Modes, everything should look like the “Maximum Performance” Choose Operating Mode (but now set to “Custom Mode”) except for:
- CPU P-State Control should be set to “Legacy”
- C-States should be set to “Legacy”
- Power/Performance Bias should be set to “OS Controlled”
- MONITOR/MWAIT should be set to “Enabled”
Under Processors, Uncore Frequency Scaling should be set to “Disabled”.
Note that the Lenovo BIOS/UEFI replicates settings and any adjustment under Operating Modes will also change the same settings under e.g. Processors and Power.
Fujitsu
Under Application Profile, set what you want to be the baseline, e.g. “Total Throughput Performance” and change:
- HWPM Support should be set to “Disabled”
- CPU C1E Support should be set to “Disabled”
- CPU C6 Report should be set to “Enabled”
- Package C State limit should be set to “C6 (Retention)”
While unrelated to Power Management, SNC (Sub NUMA Clustering) usually only benefits very specific workloads, e.g. monster VMs, if set by the initial Profile, you probably want to disable it while you are already looking at the BIOS config.
TL;DR
On the BIOS level, configure the Maximum Performance policy. Take note of the values of all the options that are set by it. Now configure a custom policy, matching the Maximum Performance policy except for P-States and deep C-States. Set the former to OS Control (legacy, not HWP / native), the latter to enabled / C6 and if the Intel Energy Performance Bias can be set to OS Controlled too, do that or just stick to Performance. Make sure C1E is disabled for the ESXi boot options if that isn’t available separately in the BIOS. Finally, set the ESXi Power Policy to Balanced.