Open Bug 1323106 Opened 8 years ago Updated 2 years ago

Investigate impact of CPU power/clock state on build performance

Categories

(Firefox Build System :: General, defect)

defect

Tracking

(Not tracked)

People

(Reporter: gps, Unassigned)

References

(Blocks 1 open bug)

Details

As part of evaluating some dual socket Xeon machines for developer machines, I discovered that an Xeon E5-2637v4 ramps up its CPU clock much more conservatively than e.g. an i7-6700K on Windows 10 (which sets the minimum CPU frequency to 5% or 10% of maximum by default). When I cranked minimum CPU to 100%, artifact build time dropped from ~170s to ~77s and full build configure dropped from ~165s to ~97s! That's quite significant.

It's worth spending some time investigating the behavior of Xeon clock speeds on Windows 10 and whether we can have `mach doctor` make changes so the frequency ramp up is more aggressive. https://software.intel.com/en-us/articles/power-management-states-p-states-c-states-and-package-c-states might be a good reference.
(In reply to Gregory Szorc [:gps] from comment #0)
> As part of evaluating some dual socket Xeon machines for developer machines,
> I discovered that an Xeon E5-2637v4 ramps up its CPU clock much more
> conservatively than e.g. an i7-6700K on Windows 10 (which sets the minimum
> CPU frequency to 5% or 10% of maximum by default). When I cranked minimum
> CPU to 100%, artifact build time dropped from ~170s to ~77s and full build
> configure dropped from ~165s to ~97s! That's quite significant.

This might be caused by core parking coupled with poor scheduling of jobs. You can try and disable core parking in the BIOS by disabling the C6 power state or use one of the various utilities to disable it in Windows [1]. If disabling core parking yields the same effect as increasing the minimum CPU frequency then you've got your culprit.

I don't think there's much that can be done in mach to work around this (save maybe explicitly setting processor affinity for single-threaded tasks to a single core, but that could hurt just as much under certain scenarios). Windows has a remarkably complex API for dealing with power management [2] so it might be possible to use it to temporarily turn off unwanted features while building. I've never used it though, so don't take my word for it.

Personally I'm using a high-clocked Xeon E3 and I've turned down the E5-based machines that were offered to me because of the generally lower single-thread performance. Far too many of our day-to-day development tasks are capped by ST performance with only a full, non-incremental build of Firefox really benefiting from lots of cores.

[1] https://bitsum.com/parkcontrol/
[2] https://msdn.microsoft.com/en-us/library/windows/desktop/bb968807(v=vs.85).aspx
> I don't think there's much that can be done in mach to work around this
> (save maybe explicitly setting processor affinity for single-threaded tasks
> to a single core, but that could hurt just as much under certain scenarios).
> Windows has a remarkably complex API for dealing with power management [2]
> so it might be possible to use it to temporarily turn off unwanted features
> while building. I've never used it though, so don't take my word for it.

powercfg.exe offers the -setactive and -import options which may be useful here. (The -change switch looks like it would be really handy but it's very limited in what settings it can modify.)
Ah, actually -setacvalueindex was the thing I was looking for. I was able to set my minimum processor state with: powercfg -setacvalueindex $GUID SUB_PROCESSOR PROCTHROTTLEMIN 100
Note that on modern intel hardware (Broadwell-E, for instance) the effect of "Balanced" (default) vs "High Performance" power modes in Win10 is way bigger than on older processors (e.g. Nehalem).  On an overclocked i7-6950X I can build in around 8m20s with hi-perf (clobbered full build) vs ~9:50 in the balanced mode.

What you see is simply utilization of the power saving features of the processor.  And I don't think it's limited to xeons.


mach could provide automatic switch to High Performance only during the build and then switch back to any mode the machine was before (e.g. Balanced).  (Should count with running more builds than one at the same time.)
Thanks for all the info in this bug - it's really helpful!

I was pleasantly surprised to see that `powercfg -setacvalueindex` appears to work without admin privileges. So it would certainly be possible to have `mach build` (or any `mach` command for that matter) adjust the power settings for the duration of the mach command. Although doing it everywhere might have performance implications,as the overhead of invoking multiple processes to query and (re)set state could be noticeable on short-running commands.

I would implement the power settings as a context manager in python/mozbuild/mozbuild/util.py so any Python code could opt into it easily. Unfortunately, I need to focus on Quantum foo right now. If someone writes a patch, I'd gladly review it.
markco: I'm not sure how power management in EC2 instances works (I /think/ Amazon controls power on all but the c4.8xlarge instances, which allow you to control C-state and P-state), but given the performance implications, you may want to check the power settings on our Windows AMIs to be sure we're maxing out power/perf.

Also, power settings will almost certainly be relevant on our dedicated hardware Windows instances. It might even cause variance in Talos results!
Blocks: fastci
Flags: needinfo?(mcornmesser)
I would like to experiment with this power setting on our new talos machines we will be setting up in Q1.  Possibly we can run some experiments with different settings and collect a lot of data to see how [un]stable the results are.  Thanks for cc'ing me on this :gps.
bholley posted on dev-platform about power on Linux:

> In a fresh Ubuntu install, there are two available frequency governors,
> "powersave" and "performance". The default is "powersave", which seems
> suboptimal on a Desktop Xeon. The intel_pstate driver doesn't support
> manually pegging the clock, but the "performance" governor seems generous
> enough that it probably doesn't matter.
> 
> Installing cpufrequtils and then setting the governor in
> /etc/init.d/cpufrequtils to "performance" seemed to do the trick.
> You can get a live read on clock speeds with cpufreq-aperf, which
> should show all logical CPUs pegged to/near their max during a
> clobber build.
>
> Changing this seemed to take a clobber build from 8:45 to 8:30,
> though I didn't remeasure in powersave.

Good times.
Arr and I spoke about this last week. We are going to try to get away from spending more time on buildbot workers configuration pieces, so I will take a look at it with the Task Cluster AMIs. I will also keep this in mind as I am working on the configuration and tuning of the hardware.
Flags: needinfo?(mcornmesser)
Product: Core → Firefox Build System
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.