Closed Bug 1411212 Opened 7 years ago Closed 5 months ago

--disable-optimize makes host programs get compiled without optimize flags

Categories

(Firefox Build System :: General, defect)

defect

Tracking

(firefox-esr115 wontfix, firefox120 wontfix, firefox121 wontfix, firefox122 fixed)

RESOLVED FIXED
122 Branch
Tracking Status
firefox-esr115 --- wontfix
firefox120 --- wontfix
firefox121 --- wontfix
firefox122 --- fixed

People

(Reporter: igoldan, Assigned: jwatt)

References

(Blocks 1 open bug)

Details

(Keywords: regression)

Attachments

(1 file)

We have detected a build metrics regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=7ce72fc1ce814f7ddacddf39f0e1da78c6261fd3

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 84%  build times summary windows-mingw32-32 debug taskcluster-m4.4xlarge     2,187.25 -> 4,026.99

Improvements:

100%  sccache hit rate summary windows-mingw32-32 debug taskcluster-m4.4xlarge     1.00 -> 0.00


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=10124

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Automated_Performance_Testing_and_Sheriffing/Build_Metrics
Component: Untriaged → Build Config
Product: Firefox → Core
:tjr Was this increase build time expected?
Flags: needinfo?(tom)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #0)
> Improvements:
> 
> 100%  sccache hit rate summary windows-mingw32-32 debug
> taskcluster-m4.4xlarge     1.00 -> 0.00

Going from 1.0 to 0.0 is not an improvement, it's a regression! It's going from 100% cache hit to 0%. That's probably a problem for all the regressions/improvements reported for sccache hits, so this should be fixed (and I don't know where that'd be)
Flags: needinfo?(igoldan)
Thanks for pointing this out!
Flags: needinfo?(igoldan)
Flags: needinfo?(jmaher)
oh we need to indicate that this is a higher == better type of regression.  There is an attribute in PERFHERDER_DATA which we need to set:
"lowerIsBetter": false

http://searchfox.org/mozilla-central/search?q=lowerIsBetter&path=

I am not sure who setup the PERFHERDER_DATA posting from build data, but this should be easy to fix for whoever did it.
Flags: needinfo?(jmaher)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #1)
> :tjr Was this increase build time expected?

It sounds reasonable. Before we didn't build tests. Now we do. I bet there's a lot of tests.

As for sscache misses, I would expect us to get a 100% miss rate for the first build since we changed how we build literally everything (I think.) However, I gather from this that we're missing sscache on everything all the time now.  That seems wrong, but I'm not sure what would cause it.
Flags: needinfo?(tom)
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #4)
> oh we need to indicate that this is a higher == better type of regression. 
> There is an attribute in PERFHERDER_DATA which we need to set:
> "lowerIsBetter": false
> 
> http://searchfox.org/mozilla-central/search?q=lowerIsBetter&path=
> 
> I am not sure who setup the PERFHERDER_DATA posting from build data, but
> this should be easy to fix for whoever did it.

Filed bug 1411304 on this.
Since we have a build break from Bug 1411187 I'm thinking this is causing the cache to completely miss all the time.
I'm trying to figure out the sccache regression. One thing I figured out though is:

'Good' sccache results:
> Compile requests                 6
> Compile requests executed        3
> Cache hits                       3
> Cache misses                     0

'Bad' sccache results:
> Compile requests                 6
> Compile requests executed        3
> Cache hits                       0
> Cache misses                     3


There's nothing directly obvious in the sccache.log files. The 'good' one has lots of cache hits (2784) and the broken one doesn't (122). Reverse of misses: 40 for the good one, 2700 for the broken one.

So it seems there's two things at play: Why doesn't sccache invoke for the majority of the MinGW build and why is it missing on the three things it used to invoke on.
Have you managed to look further into this issue?
Flags: needinfo?(tom)
Flags: needinfo?(ted)
Not since Comment 8, no. I guess the next step is to get sccache running locally and starting learning how it works. 

As far as the regression goes, I think the regression is less important than "MinGW doesn't use sccache nearly at all"
Flags: needinfo?(tom)
(In reply to Tom Ritter [:tjr] from comment #10)
> Not since Comment 8, no. I guess the next step is to get sccache running
> locally and starting learning how it works. 
> 
> As far as the regression goes, I think the regression is less important than
> "MinGW doesn't use sccache nearly at all"

Have you filed a bug for this bigger issue? If so, please link it to this bug, using the "See also" field.
Flags: needinfo?(tom)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #11)
> (In reply to Tom Ritter [:tjr] from comment #10)
> > Not since Comment 8, no. I guess the next step is to get sccache running
> > locally and starting learning how it works. 
> > 
> > As far as the regression goes, I think the regression is less important than
> > "MinGW doesn't use sccache nearly at all"
> 
> Have you filed a bug for this bigger issue? If so, please link it to this
> bug, using the "See also" field.

I have not, I was planning on just trying to address both in this bug.
Flags: needinfo?(tom)
14:29:47 T<tjr> Does it know how to handle mingw? Cause it doesn't cache anything for it either...
14:32:46 T<@ted> mingw is just gcc, so i suspect so
14:32:55 T<@ted> but maybe it doesn't understand some of the options it uses?
14:33:32 T<@ted> tjr: if you have a local mingw build environment you can try it with sccache logging enabled and see what it thinks
14:33:57 T<@ted> export SCCACHE_ERROR_LOG=/tmp/sccache.log; export RUST_LOG=sccache=trace
14:34:04 T<tjr> I do, that was my next step, although I'm not sure what exactly to look for.  I'll see if anything jumps out
14:34:06 T<@ted> and then grep the log for 'CannotCache'
Ted doesn't need this in his queue.
Flags: needinfo?(ted)
Product: Core → Firefox Build System
Okay, I've finally tracked this one down. If you've got --disable-optimize specified, sccache doesn't do its thing.

MinGW normally has --disable-optimize. Here's a try run with MinGW debug without that: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0b0d6134c566744fd8c1b9f92758a351947146cc&selectedJob=167239048

> [task 2018-03-11T02:43:02.725Z] 02:43:02     INFO -  Compile requests              3628
> [task 2018-03-11T02:43:02.725Z] 02:43:02     INFO -  Compile requests executed     3489
> [task 2018-03-11T02:43:02.725Z] 02:43:02     INFO -  Cache hits                     792
> [task 2018-03-11T02:43:02.725Z] 02:43:02     INFO -  Cache misses                  2697

And linux x86 debug does not specify --disable-optimize. But here's a run where it does specify it: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6f1978c478f17792a8d603634ee5d7ebbd19e5d8&selectedJob=167239067

> [task 2018-03-11T03:41:32.617Z] 03:41:32     INFO -  Compile requests                 6
> [task 2018-03-11T03:41:32.617Z] 03:41:32     INFO -  Compile requests executed        3
> [task 2018-03-11T03:41:32.617Z] 03:41:32     INFO -  Cache hits                       0
> [task 2018-03-11T03:41:32.617Z] 03:41:32     INFO -  Cache misses                     3


This might be intended behavior, but when I asked ted about it (a while ago) he didn't think it was supposed to do this, so tagging him and glandium.
Flags: needinfo?(ted)
Flags: needinfo?(mh+mozilla)
Summary: 84.11% build times (windows-mingw32-32) regression on push 7ce72fc1ce814f7ddacddf39f0e1da78c6261fd3 (Mon Oct 23 2017) → sccache doesn't work when --disable-optimize is specified
> [task 2018-03-11T03:41:32.617Z] 03:41:32     INFO -  Compile requests                 6

This seems bad, but maybe it's a red herring? Looking at the compile log shows that sccache is in the compile commandline:

[task 2018-03-11T02:21:01.634Z] 02:21:01     INFO -  /builds/worker/workspace/build/src/sccache2/sccache /builds/worker/workspace/build/src/gcc/bin/g++ -m32 -march=pentium-m -o mozalloc_abort.o -c -DDEBUG=1 -D_GNU_SOURCE -DIMPL_MFBT -DMOZ_HAS_MOZGLUE -I/builds/worker/workspace/build/src/memory/mozalloc -I/builds/worker/workspace/build/src/obj-firefox/memory/mozalloc -I/builds/worker/workspace/build/src/obj-firefox/xpcom -I/builds/worker/workspace/build/src/memory/build -I/builds/worker/workspace/build/src/obj-firefox/dist/include -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nspr -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nss -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-firefox/mozilla-config.h -Wall -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wwrite-strings -Wno-invalid-offsetof -Wduplicated-cond -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=free-nonheap-object -Wformat -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -msse -msse2 -mfpmath=sse -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -fno-omit-frame-pointer -Werror  -MD -MP -MF .deps/mozalloc_abort.o.pp   /builds/worker/workspace/build/src/memory/mozalloc/mozalloc_abort.cpp

I think what this indicates is just that --disable-optimize builds take longer for some reason, so sccache is hitting its idle timeout and quitting and so we lose the stats. sccache has an `SCCACHE_IDLE_TIMEOUT` option that glandium added which can be used to work around this, we're using it to work around the fact that clang takes forever to compile one source file for Android builds:
https://dxr.mozilla.org/mozilla-central/rev/a6f5fb18e6bcc9bffe4a0209a22d8a25510936be/taskcluster/ci/build/android.yml#62

Maybe you can do a try push with `SCCACHE_IDLE_TIMEOUT=0` in the environment to avoid that so we get more sensible stats?

As a sanity check, I downloaded sccache.log.gz from that linux x86 debug build, and we do have some cache hits there (mostly Rust compilations):
$ rg 'Cache miss' sccache.log  | wc -l
3339

$ rg 'Cache hit' sccache.log  | wc -l
277
Flags: needinfo?(ted)
Also, looking at sccache.log from a recent mingw32 debug build on inbound:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&selectedJob=167436147

$ rg 'Cache miss' sccache.log  | wc -l
4

$ rg 'Cache hit' sccache.log  | wc -l
3466

...so it doesn't look like we're failing to use sccache here.
I noticed that mingw builds take forever to link, I wouldn't be surprised if that hits the sccache timeout.
Flags: needinfo?(mh+mozilla)
Setting SCCACHE_IDLE_TIMEOUT=0 causes a build failure

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a08ea6bc55b7a85ea7f95d6783fccb5f1ec40d6c&selectedJob=167522334

> # Terminate any sccache server that might still be around.
> /builds/worker/workspace/build/src/sccache2/sccache --stop-server >> /builds/worker/workspace/build/src/obj-firefox/dist/sccache-orig.log 2>&1
> # Start a new server, ensuring it gets the jobserver file descriptors
> # from make (but don't use the + prefix when make -n is used, so that
> # the command doesn't run in that case)
> env RUST_LOG=sccache::compiler=debug SCCACHE_ERROR_LOG=/builds/worker/workspace/build/src/obj-firefox/dist/sccache.log /builds/worker/workspace/build/src/sccache2/sccache --start-server
> Starting sccache server...
> error: Server startup error
> caused by: Server startup failed: 1
> client.mk:80: recipe for target 'build' failed

Here's the output of the shutdown:

> $ cat sccache-orig.log
> Stopping sccache server...
> Compile requests               287
> Compile requests executed      156
> Cache hits                      26
> Cache misses                    78
> Cache timeouts                   0
> Cache read errors                0
> Forced recaches                  0
> Cache write errors               0
> Compilation failures             9
> Cache errors                    43
> Non-cacheable compilations       0
> Non-cacheable calls             26
> Non-compilation calls          105
> Unsupported compiler calls       0
> Average cache write          0.039 s
> Average cache read miss      0.052 s
> Average cache read hit       0.043 s
> Cache location             S3, bucket: Bucket(name=taskcluster-level-1-sccache-us-east-1, base_url=http://taskcluster-level-1-sccache-us-east-1.s3.amazonaws.com/)

And of the error:

> $ cat /builds/worker/workspace/build/src/obj-firefox/dist/sccache.log 
> error: Address already in use (os error 98)
Oh right, tomprince fixed that in sccache but we haven't updated to pick up that fix:
https://github.com/mozilla/sccache/pull/211
I filed bug 1445218 to try out updating to sccache 0.2.6 which has that fix.
0.2.6 and SCCACHE_IDLE_TIMEOUT=0 seems to solve the problem

https://treeherder.mozilla.org/#/jobs?repo=try&revision=22160a5c671f6cb423346323e83884b6fba514b9

> /builds/worker/workspace/build/src/sccache2/sccache --show-stats
> Compile requests              3795
> Compile requests executed     3623
> Cache hits                     335
> Cache misses                  3288

What would be the best thing to do here? Detect --disable-optimize and up the Idle Timeout?
Actually I was thinking that the best fix would be to make sccache persist its stats to disk. We can then just explicitly run `sccache --zero-stats` after starting the server so we get fresh stats per-build.
Weird. A retrigger of the build job on that try run had the same hit rate.

> Compile requests              3795
> Compile requests executed     3623
> Cache hits                     334
> Cache misses                  3289

Duration of 66 minutes vs the original 75.
That is strange! I would expect it to be near-100% on cache hits. I'll see if the logs have anything interesting.
Your retrigger gets a near 100% hit rate.

> Compile requests              3795
> Compile requests executed     3623
> Cache hits                    3607
> Cache misses                    16


The runtime: 58 minutes. Compared to 66 or 75 with a very low hit rate.  So whatever is happening to make the build slow, it's not really sccache.
It looks like host programs are being built without optimization, which I think would explain things:
[task 2018-03-13T16:14:20.821Z] 16:14:20     INFO -  /builds/worker/workspace/build/src/sccache2/sccache /builds/worker/workspace/build/src/gcc/bin/gcc -m32 -march=pentium-m -std=gnu99 -o host_pathsub.o -c  -DXP_UNIX -DDEBUG=1 -D_UNICODE -DUNICODE -I/builds/worker/workspace/build/src/config -I/builds/worker/workspace/build/src/obj-firefox/config -I/builds/worker/workspace/build/src/obj-firefox/dist/include -MD -MP -MF .deps/host_pathsub.o.pp -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nspr /builds/worker/workspace/build/src/config/pathsub.c

I had checked this on my local --disable-optimize build and thought it looked sensible, but I must have been looking at a bad example, because:
/home/luser/.cargo/bin/sccache /usr/bin/gcc -std=gnu99 -o host_pathsub.o -c  -DXP_UNIX -DDEBUG=1 -D_UNICODE -DUNICODE -I/build/mozilla-central/config -I/build/debug-mozilla-central/config -I/build/debug-mozilla-central/dist/include -MD -MP -MF .deps/host_pathsub.o.pp -I/build/debug-mozilla-central/dist/include/nspr /build/mozilla-central/config/pathsub.c

We have a separate `HOST_OPTIMIZE_FLAGS`:
https://dxr.mozilla.org/mozilla-central/rev/6ff60a083701d08c52702daf50f28e8f46ae3a1c/old-configure.in#730

And we're supposed to honor them, but maybe this isn't working as intended:
https://dxr.mozilla.org/mozilla-central/rev/6ff60a083701d08c52702daf50f28e8f46ae3a1c/python/mozbuild/mozbuild/frontend/context.py#347

I note that in the non-cross-compile case, we'll honor `MOZ_OPTIMIZE`, which doesn't seem right. We should always build host programs with optimization, regardless of the state of `--disable-optimize`.

Unrelated, but I noticed in the build log that the mingw builds have:
--host=i686-pc-linux

which seems odd. Do we need that for some reason, or could they just build 64-bit binaries?
Summary: sccache doesn't work when --disable-optimize is specified → --disable-optimize makes host programs get compiled without optimize flags
No longer depends on: 1445218
Hm, on a recent MinGW win32 build I see the optimization flags:

> /builds/worker/workspace/build/src/sccache2/sccache /bin/gcc -std=gnu99 -o host_pathsub.o -c  -DXP_UNIX -O3 -DDEBUG=1 -D_UNICODE -DUNICODE -I/builds/worker/workspace/build/src/config -I/builds/worker/workspace/build/src/obj-firefox/config -I/builds/worker/workspace/build/src/obj-firefox/dist/include -MD -MP -MF .deps/host_pathsub.o.pp -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nspr /builds/worker/workspace/build/src/config/pathsub.c
> 
> /builds/worker/workspace/build/src/obj-firefox/_virtualenv/bin/python /builds/worker/workspace/build/src/config/expandlibs_exec.py -- /builds/worker/workspace/build/src/sccache2/sccache /bin/gcc -std=gnu99 -o nsinstall_real -DXP_UNIX -O3  host_nsinstall.o host_pathsub.o

And on a MinGW win64 build I see the same thing. https://treeherder.mozilla.org/#/jobs?repo=try&revision=395b1a6e68a3752cdc56333191e9d091026bd3e7&selectedJob=170963246  (That job timed out, but the optimize flags are still there)
Severity: normal → S3
Blocks: 1867594

On a 16-core M3 MBP, my clobber build times are:

9m 29s with --disable-optimize: https://share.firefox.dev/414j6sV
6m 37s with --enable-optimize="-O1": https://share.firefox.dev/3Rn50Qf

--disable-optimize causing this 50% increase in build times seems to be down to host tools being affected by --disable-optimize, as pointed out by Mike in bug 1867594 comment 7.

Looking at our build telemetry, it looks like about 70%-80% of developer builds use --disable-optimize, that number being over 80% on macOS:
https://sql.telemetry.mozilla.org/queries/96326/source?p_date=d_last_30_days#237829

I guess the usage is probably so high for historical reasons, since --disable-optimize --disable-debug has in the past been recommended for faster build times while still generating a reasonably fast Firefox build.

But now that we have vendored in so many host tools, and until this bug is fixed, --disable-optimize is probably something to avoid when possible. Particularly on Apple silicon.

At the risk of muddying the waters a bit, I'll share some Linux times here too.

On a 7950X desktop, my clobber times are:

4m 55s with --disable-optimize: https://share.firefox.dev/47Wpkxi
6m 32s with --enable-optimize="-O1": https://share.firefox.dev/47Wpkxi

So on Linux, the --disable-optimize build is the faster of the two (the opposite of macOS). The rlbox.wasm.c step still takes longer with --disable-optimize, taking 34s vs. 17s, but that's much, much less bad that on macOS, where it takes 3m 4s vs 11s. However, in the Linux profiles the rlbox.wasm.c step still seems to be a long pole item, so fixing this bug could potentially reduce that 4m 55s time by 20s or so.

For the record, the mozconfig I used for the builds I discussed in the above two comments was:

ac_add_options --disable-launcher-process
ac_add_options --enable-layout-debugger
ac_add_options --disable-optimize
#ac_add_options --enable-optimize="-O1"
ac_add_options --disable-debug
ac_add_options --disable-release
ac_add_options --enable-dump-painting
ac_add_options --enable-debug-symbols
ac_add_options --enable-profiling
ac_add_options --disable-crashreporter

Prior to this change, --disable-optimize clobber builds on macOS took 50%
longer. Build telemetry shows over 80% of developer builds on macOS use
--disable-optimize.

The defaults for HOST_OPTIMIZE_FLAGS that are set in old-configure.in can be
overriden by setting the variable in your mozconfig. For example, setting:

HOST_OPTIMIZE_FLAGS=-O1

Assignee: nobody → jwatt
Status: NEW → ASSIGNED

This patch makes a very significant difference to --disable-optimize builds on Apple silicon macs. A couple of test clobber build profiles:

8m 56s before: https://share.firefox.dev/3R4jGlX
6m 13s after: https://share.firefox.dev/3R5Bm0E

Pushed by jwatt@jwatt.org:
https://hg.mozilla.org/integration/autoland/rev/975c80dda5fe
Use HOST_OPTIMIZE_FLAGS to compile host tools, even when not cross-compiling. r=glandium
Pushed by jwatt@jwatt.org:
https://hg.mozilla.org/integration/autoland/rev/82b65bcfa32a
Use HOST_OPTIMIZE_FLAGS to compile host tools, even when not cross-compiling. r=glandium
Status: ASSIGNED → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → 122 Branch
Flags: needinfo?(jwatt)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: