Use PogoSafeMode for PGO builds on Windows

RESOLVED FIXED in Firefox 47

Status

defect
RESOLVED FIXED
3 years ago
Last year

People

(Reporter: ehoogeveen, Assigned: ted)

Tracking

Trunk
mozilla47
Unspecified
Windows

Firefox Tracking Flags

(firefox47 fixed)

Details

Attachments

(1 attachment)

I decided to try a PGO build locally and noticed that we get a lot of "Expectation failed" messages. Searching for that message reveals that MSVC has two profiling modes: 'fast' and 'safe'. For multi-threaded code, apparently safe mode is supposed to be used, but it defaults to the fast mode.

To switch to safe mode, the PogoSafeMode flag must be passed. It isn't entirely clear to me *how* this flag is supposed to be passed: the documentation [1] says to set it as an environment variable on x86, and pass it as a linker flag on x64 - but the answer to a Stack Overflow question says to pass it to the compiler instead [2].

Setting this option might slow down the profiling step, but might also help stability of the optimizations. It seems Chrome also made this change last year [3] (as a linker option, incidentally). I'm not really familiar enough with the build system to know where to set this, but I can obviously help test it. I can also prepare a patch if someone points me in the right direction, but it might be faster for an expert to do :)

[1] https://msdn.microsoft.com/en-us/library/ee390848.aspx
[2] http://stackoverflow.com/questions/14345715/what-are-these-expectation-failed-messages-in-vs2010-pgo-and-how-do-i-fix-them
[3] https://groups.google.com/a/chromium.org/forum/#!topic/chromium-reviews/5TQZUc6lYNg
It looks like it *should* be passed to the linker - at least, all variants of link.exe for Visual Studio 2013 list /PogoSafeMode in their usage description.
(cc froydnj and ehsan as I have a feeling they may be interested in this)

Thank you for filing this bug. It certainly looks like /PogoSafeMode is on paper something we should add to ensure PGO isn't creating bad optimizations that could cause instability problems (read: crashes). I'm not thrilled about creating PGO builds getting even slower than they already are. But if we have to trade stability for build performance, I'm pretty sure that's not even a debate.

I'm performing some PGO "benchmarks" on my Windows desktop at home and may investigate the performance of adding /PogoSafeMode. In the mean time, someone may want to push a PGO build to Try to see what the automation performance impact and any Talos fallout is.
Assignee: nobody → ted
AFAICT this should only impact the actual profiling step--this is putting locks in the profile gathering code. Looking at a recent PGO build log (http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-win32-pgo/1456317101/mozilla-inbound-win32-pgo-bm70-build1-build30.txt.gz) shows that this currently takes about 6 minutes:

06:15:14     INFO -  MOZ_PGO_INSTRUMENTED=1 JARLOG_FILE=jarlog/en-US.log EXTRA_TEST_ARGS=10 c:/builds/moz2_slave/m-in-w32-pgo-00000000000000000/build/src/mozmake.EXE -C c:/builds/moz2_slave/m-in-w32-pgo-00000000000000000/build/src/obj-firefox pgo-profile-run
06:15:14     INFO -  mozmake.EXE[2]: Entering directory 'c:/builds/moz2_slave/m-in-w32-pgo-00000000000000000/build/src/obj-firefox'
06:15:14     INFO -  c:/builds/moz2_slave/m-in-w32-pgo-00000000000000000/build/src/obj-firefox/_virtualenv/Scripts/python.exe c:/builds/moz2_slave/m-in-w32-pgo-00000000000000000/build/src/build/pgo/profileserver.py 10
<...>
06:21:17     INFO -  mozmake.EXE[2]: Leaving directory 'c:/builds/moz2_slave/m-in-w32-pgo-00000000000000000/build/src/obj-firefox'
06:21:17     INFO -  c:/builds/moz2_slave/m-in-w32-pgo-00000000000000000/build/src/mozmake.EXE -f c:/builds/moz2_slave/m-in-w32-pgo-00000000000000000/build/src/client.mk maybe_clobber_profiledbuild CREATE_MOZCONFIG_JSON=

Unless this change makes it take significantly longer it's not likely to be a problem.
Comment on attachment 8723214 [details]
MozReview Request: bug 1249923 - use PogoSafeMode for MSVC PGO. r?gps

https://reviewboard.mozilla.org/r/36409/#review32997

Even if this causes build time regressions, I don't think I have a choice about approving this since it could improve stability.
Attachment #8723214 - Flags: review?(gps) → review+
Looking at the try builds from yesterday, this doesn't seem to have changed the time to run the profiling scenario in a noticeable way. It's still about 6 minutes.
On my i7-6700K desktop with VS2013, adding just /PogoSafeMode (no /cgthreads) actually made the profile instrumentation linking a few minutes *faster*. Sample size of 1, so not statistically significant. But, both builds hit the libxul link within 20s elapsed time of each other (~20 minutes into the build) so it certainly appears the difference is isolated to /PogoSafeMode.

Before: 43:52 to complete PGO instrumentation link
After:  39:05

Full PGO build time:

Before: 95:18
After:  87:43

Again, only sample size of 1.

The addition of this flag also caused ~5000 warnings to disappear. Many of them appear to be in the JS engine. 2873 "Expectation failed." 1846 "Arc N --> M has negative count." 318 "Block N incoming counts differ." 279 "Block N outgoing counts differ from block count." None of these warnings appear at all with /PogoSafeMode.
Maybe the warnings themselves were slowing it down? IIRC the Windows terminal is pretty slow. Great news either way, thanks guys :)
Though I suppose, with that big of a difference, it's more likely due to something internal (like correcting the inconsistencies).
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #10)
> Maybe the warnings themselves were slowing it down?

I redirected output to a file to mitigate this. And, it was pretty clear that the PGO instrumentation phase itself (which prints nothing) was a few minutes faster.

Comment 14

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/8986592ec954
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47

Updated

Last year
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.