Provide PGO builds of clang
Categories
(Firefox Build System :: General, defect)
Tracking
(firefox77 fixed)
| Tracking | Status | |
|---|---|---|
| firefox77 | --- | fixed |
People
(Reporter: erahm, Assigned: away)
References
Details
Attachments
(7 files, 1 obsolete file)
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review |
As a platform developer I'm interested in using a PGO build of clang (or gcc) that is trained on the Firefox codebase. I attempted to do this on my own and failed horribly, but it seems like something our build team could provide. I don't need anything terribly special or smart, just a standard archive such as what llvm.org provides [1]. Personally I'd use this on Ubuntu 16.04 x86_64, I imagine a build for OSX would be welcome as well. It's also possible this isn't worth the effort, so just evaluating whether a PGO build of clang has better performance should be a prerequisite. [1] http://releases.llvm.org/download.html#3.9.0
Comment 1•4 years ago
|
||
I came across this, where they compiled FF with PGO GCC. https://gcc.gnu.org/ml/gcc/2013-03/msg00210.html " Firefox: vanilla: 5143.27s user 267.27s system 346% cpu 26:02.03 total PGO : 4590.37s user 270.21s system 344% cpu 23:28.89 total LTO : 5056.11s user 268.04s system 348% cpu 25:28.73 total LTO+PGO: 4598.79s user 269.01s system 347% cpu 23:22.13 total * GCC build with PGO is ~10% faster than a vanilla bootstrapped compiler. " So, looks like a worthwhile optimisation.
| Reporter | ||
Comment 2•4 years ago
|
||
(In reply to mayankleoboy1 from comment #1) > I came across this, where they compiled FF with PGO GCC. > https://gcc.gnu.org/ml/gcc/2013-03/msg00210.html > > > " > > Firefox: > vanilla: 5143.27s user 267.27s system 346% cpu 26:02.03 total > PGO : 4590.37s user 270.21s system 344% cpu 23:28.89 total > LTO : 5056.11s user 268.04s system 348% cpu 25:28.73 total > LTO+PGO: 4598.79s user 269.01s system 347% cpu 23:22.13 total > > * GCC build with PGO is ~10% faster than a vanilla bootstrapped compiler. > " > > So, looks like a worthwhile optimisation. That's great! Unfortunately I'm not the one to implement this, hopefully someone on the build team will take a look.
Updated•3 years ago
|
Comment 3•3 years ago
|
||
FWIW, I got apparent improvements in build time by building my compiler from source, as well as building with PGO: https://jdashg.github.io/misc/pgo-clang.html I can back up a ~10% compile time advantage from PGO, though this was only a 5% total build time advantage for my -j32 machine. Compiling with -O3 and -march=native was a much larger win, so I'm not sure distributing PGO binaries is the right solution. I did not test -O3 without -march=native. This is worth following up on, but I expect performance of my system compiler (Arch Linux) package to be similar to a build without -march=native.
Comment 4•3 years ago
|
||
Actually in more controlled testing, I did find PGO to be the largest win by far. (12m25s->10m30s)
Updated•2 years ago
|
Comment 5•2 years ago
|
||
Some benchs for llvm 9 with lto/pgo:
https://gist.github.com/nathanchance/7bc40942ac0a86490e692b63345e6f6a
Separating out the mechanical/"boring" changes to make the next patch more clear. This patch adds the ability to build a fourth stage that for now doesn't do anything special.
I changed to using >= to make it more obvious that e.g. "here is what's going to happen for stage 2" -- the off-by-one was too hard on my brain.
Updated•1 year ago
|
Separating out the mechanical/"boring" changes to make the next patch more clear. This patch adds the ability to build a fourth stage that for now doesn't do anything special.
I changed to using >= to make it more obvious that e.g. "here is what's going to happen for stage 2" -- the off-by-one was too hard on my brain.
This adds the ability to do four-stage PGO builds. This was surprisingly straightforward thanks to PGO being a well-supported scenario in LLVM's cmake.
For reference, the stages are:
stage1: Initial build with gcc
stage2: Instrumented build using stage1
stage3: Train by using the instrumented stage2 to build the clang tree
stage4: Optimize using the stage3 compiler and the profdata created with it
Depends on D69079
Otherwise, PGO builds would fail to find asan at stage2 because the instrumented build uses LLVM_BUILD_RUNTIME=No.
Depends on D69080
| Assignee | ||
Comment 10•1 year ago
|
||
| Assignee | ||
Comment 11•1 year ago
|
||
Depends on D69083
| Assignee | ||
Comment 12•1 year ago
|
||
| Assignee | ||
Comment 13•1 year ago
|
||
This will partially atone for making builds longer with PGO.
Depends on D69618
Comment 14•1 year ago
|
||
Pushed by dmajor@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a64593d4c645 build-clang: pass is_final_stage even for stage1. r=glandium https://hg.mozilla.org/integration/autoland/rev/441282fd1fea build-clang: avoid building unnecessary things in intermediate stages. r=glandium https://hg.mozilla.org/integration/autoland/rev/8adfc59eb3c5 build-clang: Install imports and asan symbols only in the final stage r=glandium https://hg.mozilla.org/integration/autoland/rev/96701b9815c2 build-clang: Merge LLVM a84b200e604 to fix Windows PGO. r=glandium https://hg.mozilla.org/integration/autoland/rev/493c338ad705 build-clang: Add support for 4-stage builds r=glandium https://hg.mozilla.org/integration/autoland/rev/e04c57ed0c04 build-clang: Add support for PGO builds. r=glandium https://hg.mozilla.org/integration/autoland/rev/810a84a18948 build-clang: Convert 3-stage builds to 4-stage PGO builds. r=glandium
Comment 15•1 year ago
|
||
Backed out 7 changesets for causing bpgo failures.
Backout link: https://hg.mozilla.org/integration/autoland/rev/670150d0ab6626a805f6a1be61531c01bbdbc8be
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=296609765&repo=autoland&lineNumber=4185
Comment 16•1 year ago
|
||
Pushed by dmajor@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/084f9e2cdb83 build-clang: pass is_final_stage even for stage1. r=glandium https://hg.mozilla.org/integration/autoland/rev/f4874ffeac44 build-clang: avoid building unnecessary things in intermediate stages. r=glandium https://hg.mozilla.org/integration/autoland/rev/4f67a5a67abf build-clang: Install imports and asan symbols only in the final stage r=glandium https://hg.mozilla.org/integration/autoland/rev/3129eeee4887 build-clang: Merge LLVM a84b200e604 to fix Windows PGO. r=glandium https://hg.mozilla.org/integration/autoland/rev/21780faba92c build-clang: Add support for 4-stage builds r=glandium https://hg.mozilla.org/integration/autoland/rev/475d78d8da12 build-clang: Add support for PGO builds. r=glandium https://hg.mozilla.org/integration/autoland/rev/f7432af8daf3 build-clang: Convert 3-stage builds to 4-stage PGO builds. r=glandium
Comment 17•1 year ago
|
||
== Change summary for alert #25582 (as of Thu, 09 Apr 2020 02:25:56 GMT) ==
Improvements:
9% build times linux64 debug plain taskcluster-c5.4xlarge 1,339.46 -> 1,213.62
9% build times linux64 debug plain taskcluster-m5.4xlarge 1,407.98 -> 1,277.07
9% build times linux64 debug plain taskcluster-c5d.4xlarge 1,316.49 -> 1,203.20
8% build times windows2012-64 debug plain taskcluster-c4.4xlarge 2,679.84 -> 2,466.41
7% build times windows2012-aarch64 opt aarch64-no-eme nightly taskcluster-c5.4xlarge 2,287.62 -> 2,132.88
6% build times windows2012-32-shippable opt nightly taskcluster-c5d.4xlarge 2,266.06 -> 2,131.84
6% build times windows2012-aarch64 opt aarch64-no-eme nightly taskcluster-m5.4xlarge 2,393.91 -> 2,255.14
5% build times android-5-0-aarch64 pgo taskcluster-m5.4xlarge 2,350.24 -> 2,234.64
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=25582
Comment 18•1 year ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/084f9e2cdb83
https://hg.mozilla.org/mozilla-central/rev/f4874ffeac44
https://hg.mozilla.org/mozilla-central/rev/4f67a5a67abf
https://hg.mozilla.org/mozilla-central/rev/3129eeee4887
https://hg.mozilla.org/mozilla-central/rev/21780faba92c
https://hg.mozilla.org/mozilla-central/rev/475d78d8da12
https://hg.mozilla.org/mozilla-central/rev/f7432af8daf3
Comment 19•1 year ago
|
||
(In reply to Alexandru Ionescu :alexandrui (needinfo me) from comment #17)
== Change summary for alert #25582 (as of Thu, 09 Apr 2020 02:25:56 GMT) ==
Improvements:
9% build times linux64 debug plain taskcluster-c5.4xlarge 1,339.46 -> 1,213.62
9% build times linux64 debug plain taskcluster-m5.4xlarge 1,407.98 -> 1,277.07
9% build times linux64 debug plain taskcluster-c5d.4xlarge 1,316.49 -> 1,203.20
8% build times windows2012-64 debug plain taskcluster-c4.4xlarge 2,679.84 -> 2,466.41
7% build times windows2012-aarch64 opt aarch64-no-eme nightly taskcluster-c5.4xlarge 2,287.62 -> 2,132.88
6% build times windows2012-32-shippable opt nightly taskcluster-c5d.4xlarge 2,266.06 -> 2,131.84
6% build times windows2012-aarch64 opt aarch64-no-eme nightly taskcluster-m5.4xlarge 2,393.91 -> 2,255.14
5% build times android-5-0-aarch64 pgo taskcluster-m5.4xlarge 2,350.24 -> 2,234.64For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=25582
Congrats!
Comment 20•1 year ago
•
|
||
(it's reassuring that this matches our experiments!)
Description
•