Closed
Bug 1250797
Opened 8 years ago
Closed 8 years ago
Investigate PGO build times with various configurations
Categories
(Firefox Build System :: General, defect)
Firefox Build System
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gps, Assigned: gps)
References
Details
Attachments
(3 files)
One of my pieces of homework from the build system meet-up happening right now is to investigate PGO build times with various configurations. Things to test: * Performance on my desktop (i7-6700K) vs automation * Visual Studio 2013 vs 2015 * Impact of /PogoSafeMode (bug 1249923) ted: what was that command line flag related to number of threads you mentioned earlier today?
Flags: needinfo?(ted)
Assignee | ||
Comment 1•8 years ago
|
||
First observation and it is a big one: my home desktop machine (i7-6700K - one of the fastest consumer grade CPUs you can currently buy) running Visual Studio 2015 is significantly faster than the fastest PGO build I found in automation. http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-win32-pgo/1456025532/mozilla-central-win32-pgo-bm72-build1-build16.txt.gz is the fastest 32-bit PGO build job I found in terms of wall time. Relevant times from that log: 19:42:01 ( 00:00) - invoke `mach build` 19:44:58 ( 02:57) - configure/config.status finish 20:13:43 ( 31:42) - libxul PGO instrumentation begins 20:58:45 ( 76:44) - libxul PGO codegen finishes ... 2nd compilation pass On my machine (I haven't got it past the first PGO build pass yet): 00:00 - invoke `mach build` 01:39 - configure/config.status finish 15:48 - libxul PGO instrumentation begins 28:27 - libxul PGO codegen finishes My machine is ~2.7x faster than automation for roughly the same work. Although, it was only ~2x faster before PGO profiling started. I am running VS2015 Update 1 (vs VS2013 in automation). PGO profiling clearly saturates only 1 physical CPU core. My machine is 4.0GHz/core. I'm not sure what the machine in automation is running. But Xeons that typically run in data centers (like AWS) are typically not clocked that high. Typically hosting providers go for dual, quad, hex, octa, etc processor configurations and clock speed really drops off as you add more cores - see https://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors). Only the new Skylake-DT processors can reach 4.0 GHz and would be cycle-for-cycle comparable to my i7-6700K. I have no clue if Amazon runs those yet or not. I wouldn't bet money on it.
Assignee | ||
Comment 2•8 years ago
|
||
Shows a log from the first pass of a PGO build on my i7-6700K.
Assignee | ||
Comment 3•8 years ago
|
||
I should also mention I did a poor job of sanitizing my machine before measuring. I was actively browsing, etc during the build. I also noticed that Windows Defender was sucking a bunch of CPU during the build. How much, I'm not entirely sure. I wouldn't be surprised if my local build could be 5-10% faster if my computer were otherwise idle and I disabled Windows Defender.
Comment 4•8 years ago
|
||
The linker flag is cgthreads:N https://msdn.microsoft.com/en-us/library/dn631956%28v=vs.120%29.aspx
Flags: needinfo?(ted)
Comment 5•8 years ago
|
||
Also, it would be good if you could compare VS2013, since we think VS2015 might be faster in general, so you'd be comparing apples to apples.
Assignee | ||
Comment 6•8 years ago
|
||
Here's a build log of a full PGO 32-bit build from my i7-6700K using VS2013 Update 4. This is roughly an apples to apples comparison with automation. However, it doesn't do symbol generation, packaging, installer generation, etc, so wall to wall times are not comparable. You can however take the subset of times (the PGO build itself) as apples to apples. 00:00 - `mach build` 01:44 - configure/config.status complete 17:55 - libxul PGO instrumentation linking begins 40:59 - libxul PGO instrumentation ends 43:32 - pgo-profile-run make target begins 46:57 - pgo-profile-run finishes 47:00 - 2nd build pass begins 49:39 - libxul PGO update pass begins 87:25 - libxul PGO update pass finishes 88:31 - build finishes I need to go back to the automation log to collect times after the libxul PGO instrumentation run. But my machine finished the initial libxul PGO pass at 40:59 versus 76:44 in automation, so it is still significantly faster. VS2015 was 28:27 vs 40:59 on the same machine!!
Assignee | ||
Comment 7•8 years ago
|
||
System resource utilization during 32-bit PGO build on VS2013. Drop this file into objdir/.mozbuild/build_resources.json and load it with `mach resource-usage`. Or look at the raw data. Sadly, the section capture from the first pass is lost because we're overwriting e.g. "export" and "compile" with last write wins from the 2nd pass. We should fix that. And we should teach it to explicitly capture PGO phases.
Assignee | ||
Comment 8•8 years ago
|
||
catlee, arr: you may find the preliminary figures in this bug interesting.
Comment 9•8 years ago
|
||
bug 1250971 covers adding cgthreads:N to the linker commandline.
Comment 10•8 years ago
|
||
Bruce Dawson (from Google) linked me to the Microsoft bug he filed which was fixed in 2015 and is the result of most of the speedup there: https://connect.microsoft.com/VisualStudio/feedback/details/1064219/ltcg-linking-of-chromes-pdf-dll-spends-60-of-time-in-c2-dll-ssrfree
Assignee | ||
Comment 11•8 years ago
|
||
Bug 1249923 comment 9 contains measurements of the impact of /PogoSafeMode. The MSDN docs say it should make things slower. Yet it actually appeared to speed up PGO linking by a handful of minutes on my machine. Weird.
Assignee | ||
Comment 12•8 years ago
|
||
Using the patch in bug 1251313 and the workaround for the startup crash in bug 1244006, I was able to perform a full PGO build on VS 2015 Update 1! AFAICT, 2015 is full of wins with no obvious downsides so far! Using /CGTHREADS:8 (best of 2 runs, which were very similar): VS 2013 /PogoSafeMode VS 2015 VS 2015 /PogoSafeMode libxul 1 start 15:36 13:45 13:44 libxul 1 finish 39:16 (23:40) 26:17 (12:32) 26:06 (12:22) profileserver start 41:40 28:34 28:26 profileserver finish 45:19 31:48 31:53 libxul 2 start 48:01 33:11 33:15 libxul 2 finish 86:22 (38:21) 46:56 (13:45) 47:16 (14:01) complete 87:24 47:54 48:13 VS 2013 vs 2015 with /CGTHREADS:8 /PogoSafeMode (which will soon be our default configuration): 2013 2015 delta libxul 1 start 15:36 13:44 -1:58 libxul 1 finish 39:16 26:06 -13:08 profileserver start 41:40 28:26 -13:14 profileserver finish 45:19 31:53 -13:26 libxul 2 start 48:01 33:15 -14:46 libxul 2 finish 86:22 47:16 -39:06 complete 87:24 48:13 -39:11 2013 2015 delta percent libxul pass 1: 23:40 12:22 -11:18 52.3% libxul pass 2: 38:21 14:01 -24:20 36.5% Compilation is faster. PGO instrumentation is faster. PGO update is faster. Everything is faster! What's really going to blow your mind is that my machine is able to perform a PGO build on VS 2015 faster than automation is able to perform an incremental non-PGO build on VS 2013. (But my machine is a 4.0 GHz Skylake and you aren't going to find that on a Xeon chip in a cloud services provider because they don't make Xeons that fast.) Given the importance of PGO build turnaround time (it is the slowest build and prevents tests from running sooner and therefore slows down chemspills and mozilla-central integration), unless VS 2015 Talos results show a regression, I think upgrading our automation to VS 2015 should be a high priority.
Comment 13•8 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #12) > Given the importance of PGO build turnaround time (it is the slowest build > and prevents tests from running sooner and therefore slows down chemspills > and mozilla-central integration), unless VS 2015 Talos results show a > regression, I think upgrading our automation to VS 2015 should be a high > priority. Assuming all's well, does that mean we could just turn off opt builds in favor of pgo ones? That would be a major boon in terms of getting accurate and non-redundant talos results.
Assignee | ||
Comment 14•8 years ago
|
||
(In reply to William Lachance (:wlach) from comment #13) > Assuming all's well, does that mean we could just turn off opt builds in > favor of pgo ones? That would be a major boon in terms of getting accurate > and non-redundant talos results. In the ideal world, we'd likely do PGO builds for all builds (or at least builds that touch binaries). I'm not sure if we'd turn off non-PGO builds. I could see that going both ways.
Comment 15•8 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #14) > (In reply to William Lachance (:wlach) from comment #13) > > Assuming all's well, does that mean we could just turn off opt builds in > > favor of pgo ones? That would be a major boon in terms of getting accurate > > and non-redundant talos results. > > In the ideal world, we'd likely do PGO builds for all builds (or at least > builds that touch binaries). I'm not sure if we'd turn off non-PGO builds. I > could see that going both ways. Cool, as long as we have a pgo build for every revision with a binary change, that would be enough to make Talos pgo-only (at least on Windows).
Assignee | ||
Comment 16•8 years ago
|
||
Non-PGO builds on 2013 and 2015 appear to take roughly the same amount of wall time (20:50). On preliminary inspections it looks like there might be a lull towards the end of the build before linking kicks in. The lull on 2015 /might/ be longer than 2013. I'll have to dig into the logs in more detail and/or add more detail in the logs. So one less regression to worry about with switching to 2015.
Assignee | ||
Comment 17•8 years ago
|
||
So, good news is VS 2015 Update 2 CTP appears to be faster than VS 2015 Update 1: the first PGO pass for libxul is 7:21 compared to 12:22! The bad news is we have another startup crash in the following from libstagefright: void VectorImpl::_do_splat(void* dest, const void* item, size_t num) const { do_splat(dest, item, num); } The Visual Studio debugger is telling me that the crash is in pgort140.dll, which isn't very comforting. I can't even get symbols for the 4 frames after xul.dll. I'm not the C++ debugging guru. Perhaps someone could install VS 2015 Update 2 CTP and help file bugs...
Assignee | ||
Comment 18•8 years ago
|
||
Good news: VS 2015 Update 2 RC came out today and it is able to perform a PGO build without crashing (unlike the CTP). The bad news is the PGO pass speedups from the CTP are gone: Update 2 RC appears to build within statistical variance of Update 1. I can only hope that the speedups from CTP will show up again in Update 3. I guess time will tell.
Assignee | ||
Comment 19•8 years ago
|
||
I was able to get VS2015u1 PGO running in automation last night. 3 each builds of 32-bit and 64-bit completed. The fastest 32-bit build (http://archive.mozilla.org/pub/firefox/try-builds/gszorc@mozilla.com-95a24d38f004f2baa1146a4dd626ecabcf48a3c0/try-win32/try-win32-bm78-try1-build3976.txt.gz) *job* took 135 minutes. Here are the relevant times since build start from the compilation: libxul 1 start 27:34 libxul 1 end 49:53 (22:19) libxul 1 start 65:15 libxul 2 end 108:34 (43:19) Let's compare that to the fastest PGO job I could find on central (http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-win32-pgo/1457440245/mozilla-central-win32-pgo-bm73-build1-build1.txt.gz): libxul 1 start 32:24 libxul 1 end 80:00 (47:36) libxul 2 start 103:26 libxul 2 end 172:38 (69:12) Pass 1 was >2x faster. Pass 2 was <2x faster. (I had a 3x speedup on pass 2 on my home machine - weird.) To summarize the PGO build times through libxul 2nd pass: VS2013 VS2015 | VS2015 (i7-6700K @ home) libxul 1 start 32:24 27:34 (- 4:50) | 13:44 (-13:50) libxul 1 end 80:00 49:53 (-30:07) | 26:06 (-23:47) libxul 2 start 103:26 65:15 (-38:11) | 33:15 (-32:00) libxul 2 end 172:38 108:34 (-64:04) | 48:13 (-60:21) Important takeaways: * VS2015u1 is >1hr faster than VS2013 doing PGO builds in automation! * VS2015u1 on my i7-6700K is >1hr faster than VS2015u1 running in automation
Assignee | ||
Comment 20•8 years ago
|
||
I'm not going to post the numbers, but it looks like 64-bit builds are at least 1 hr faster in automation. Possibly on the order of 90-100 minutes faster!
Assignee | ||
Comment 21•8 years ago
|
||
I'm going to say this bug has served its purpose and there is no major work to do here except finish up the transition to the VS 2015 in automation and see what the actual numbers tell us.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Core → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•