Closed Bug 1494216 Opened 6 years ago Closed 6 years ago

3.08 - 7.89% build times (linux32, linux64) regression on push 5ab8b903147a (Wed Sep 19 2018)

Categories

(Testing :: General, defect)

64 Branch
All
Linux
defect
Not set
normal

Tracking

(firefox-esr60 unaffected, firefox62 unaffected, firefox63 unaffected, firefox64 fixed)

VERIFIED FIXED
mozilla64
Tracking Status
firefox-esr60 --- unaffected
firefox62 --- unaffected
firefox63 --- unaffected
firefox64 --- fixed

People

(Reporter: igoldan, Assigned: glandium)

References

Details

(Keywords: regression)

Attachments

(1 file)

We have detected a build metrics regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=0e28dd35739698fc6794ca4aa218975c512acef1&tochange=5ab8b903147a0cc97b21d278299840b9e38aa1f6

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  8%  build times linux64 opt plain taskcluster-c5d.4xlarge      1,106.42 -> 1,193.72
  6%  build times linux64 opt taskcluster-m5.4xlarge tup         906.49 -> 964.44
  6%  build times linux64 opt taskcluster-c4.4xlarge tup         1,103.33 -> 1,171.70
  6%  build times linux64 debug plain taskcluster-c4.4xlarge     1,220.40 -> 1,295.69
  6%  build times linux64 pgo taskcluster-c5d.4xlarge            4,120.88 -> 4,363.25
  5%  build times linux64 opt taskcluster-m4.4xlarge tup         1,185.24 -> 1,249.86
  4%  build times linux64 pgo taskcluster-c4.4xlarge             4,936.97 -> 5,127.41
  3%  build times linux32 pgo taskcluster-c4.4xlarge             5,208.63 -> 5,368.89


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=16049

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Automated_Performance_Testing_and_Sheriffing/Build_Metrics
The primary suspect bugs for causing these increases are bug 1492037 and bug 1483780. But I think bug 1492037 is more related to this issue.

:froydnj am I correct?
Flags: needinfo?(nfroyd)
That seems like a reasonable candidate, though that's a pretty extreme slowdown.  I'd think that would only happen for the case of many shared libraries, and a lot of cross-library calls...though perhaps libLLVM -> libLLVM calls still go through the PLT and thus incur a performance penalty?
Flags: needinfo?(nfroyd) → needinfo?(mh+mozilla)
(In reply to Nathan Froyd [:froydnj] from comment #2)
> That seems like a reasonable candidate, though that's a pretty extreme
> slowdown.  I'd think that would only happen for the case of many shared
> libraries, and a lot of cross-library calls...though perhaps libLLVM ->
> libLLVM calls still go through the PLT and thus incur a performance penalty?

From a cursory look at the libLLVM.so disassembly, it seems like that's what it is. LLVM internal calls are going through the PLT for all the symbols libLLVM exports. Which makes sense in a unix purity sense, but really doesn't for our purposes.

I'm going to try to stick a -Bsymbolic in there and see how it goes.
Flags: needinfo?(mh+mozilla)
With libLLVM being a shared library exporting many symbols, all internal
calls using those symbols default to go through the PLT, which is
unnecessary (and costly) overhead. Using -Bsymbolic makes internal calls
go directly to the right place without going through the PLT.
I confirmed locally that this does seem to bring us back to about the same perf as before (maybe slightly slower, but not as dramatic a difference)
Assignee: nobody → mh+mozilla
Comment on attachment 9012424 [details]
Bug 1494216 - Use the -Bsymbolic linker flag to build clang

Nathan Froyd [:froydnj] has approved the revision.
Attachment #9012424 - Flags: review+
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d4f56bcc3896
Use the -Bsymbolic linker flag to build clang. r=froydnj
Backout by mh@glandium.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b48f60cd2656
Backout changeset d4f56bcc3896 to give time to toolchains to build without blocking other landings.
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5d60b68a0a42
Use the -Bsymbolic linker flag to build clang. r=froydnj
Backout by dluca@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5c2c4e17f97a
Backed out changeset 5d60b68a0a42 for xpcshell failures in memory/replace/dmd/test/test_dmd.js. CLOSED TREE
I'm mystified. How can this be related to how the compiler is linked?! Also, what are those lines with timestamps in the stack trace?!?

20:36:19     INFO -  PID 7385 | --- /Users/cltbld/tasks/task_1538098695/build/tests/xpcshell/tests/memory/replace/dmd/test/complete-full1-live-expected.txt	2015-12-31 16:00:00.000000000 -0800
20:36:19     INFO -  PID 7385 | +++ /Users/cltbld/tasks/task_1538098695/build/tests/xpcshell/tests/memory/replace/dmd/test/complete-full1-live-actual.txt	2018-09-27 20:36:19.000000000 -0700
20:36:19     INFO -  PID 7385 | @@ -13,7 +13,11 @@
20:36:19     INFO -  PID 7385 |    8,192 bytes (7,169 requested / 1,023 slop)
20:36:19     INFO -  PID 7385 |    67.72% of the heap (67.72% cumulative)
20:36:19     INFO -  PID 7385 |    Allocated at {
20:36:19     INFO -  PID 7385 | -    #01: ... DMD.cpp ...
20:36:19     INFO -  PID 7385 | +    #01: 2018-09-27 20:36:19.169 atos[7405:136841] Metadata.framework [Error]: couldn't get the client port
20:36:19     INFO -  PID 7385 | +    #02: TestFull(char const*, int, char const*, int) (in SmokeDMD) + 867
20:36:19     INFO -  PID 7385 | +    #03: TestFull(char const*, int, char const*, int) (in SmokeDMD) + 563
20:36:19     INFO -  PID 7385 | +    #04: 2018-09-27 20:36:19.145 atos[7403:136834] Metadata.framework [Error]: couldn't get the client port
20:36:19     INFO -  PID 7385 | +    #05: 2018-09-27 20:36:19.192 atos[7407:136848] Metadata.framework [Error]: couldn't get the client port
20:36:19     INFO -  PID 7385 |    }
20:36:19     INFO -  PID 7385 |  }
20:36:19     INFO -  PID 7385 | @@ -22,7 +26,11 @@
20:36:19     INFO -  PID 7385 |    1,024 bytes (1,023 requested / 1 slop)
20:36:19     INFO -  PID 7385 |    8.47% of the heap (76.19% cumulative)
20:36:19     INFO -  PID 7385 |    Allocated at {
20:36:19     INFO -  PID 7385 | -    #01: ... DMD.cpp ...
20:36:19     INFO -  PID 7385 | +    #01: 0x000117ad (in libmozglue.dylib) + 189
20:36:19     INFO -  PID 7385 | +    #02: TestFull(char const*, int, char const*, int) (in SmokeDMD) + 921
20:36:19     INFO -  PID 7385 | +    #03: TestFull(char const*, int, char const*, int) (in SmokeDMD) + 563
20:36:19     INFO -  PID 7385 | +    #04: 2018-09-27 20:36:19.145 atos[7403:136834] Metadata.framework [Error]: couldn't get the client port
20:36:19     INFO -  PID 7385 | +    #05: 2018-09-27 20:36:19.192 atos[7407:136848] Metadata.framework [Error]: couldn't get the client port
20:36:19     INFO -  PID 7385 |    }
20:36:19     INFO -  PID 7385 |  }
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/99b28f8874bb
Use the -Bsymbolic-functions linker flag to build clang. r=froydnj
Backout by mh@glandium.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8444d933e44c
Backout changeset 99b28f8874bb to give time to toolchains to build without blocking other landings.
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1ba04509b37a
Use the -Bsymbolic-functions linker flag to build clang. r=froydnj
https://hg.mozilla.org/mozilla-central/rev/1ba04509b37a
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Because the drop of build time is not as great as one would expect (https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1682354,1,2&selected=mozilla-inbound,1682354,386387,588387306), I did try runs with the tree as of landing bug 1492037, and here are the results:

Revision before bug 1492037 vs. bug 1492037 + the patch from this bug:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9362703cb6db&newProject=try&newRevision=10850ef0cb30&framework=2

For reference: revision before bug 1492037 vs. bug 1492037
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9362703cb6db&newProject=try&newRevision=643f1454787f&framework=2

I think we can conclude this bug fixes the regression as it was reported, but it seems clear something else regressed build times in between, and I think it's cranelift.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: