Closed Bug 1326486 Opened 4 years ago Closed 4 months ago

Provide PGO builds of clang

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox77 fixed)

RESOLVED FIXED
mozilla77
Tracking Status
firefox77 --- fixed

People

(Reporter: erahm, Assigned: dmajor)

References

Details

Attachments

(7 files, 1 obsolete file)

As a platform developer I'm interested in using a PGO build of clang (or gcc) that is trained on the Firefox codebase. I attempted to do this on my own and failed horribly, but it seems like something our build team could provide.

I don't need anything terribly special or smart, just a standard archive such as what llvm.org provides [1]. Personally I'd use this on Ubuntu 16.04 x86_64, I imagine a build for OSX would be welcome as well.

It's also possible this isn't worth the effort, so just evaluating whether a PGO build of clang has better performance should be a prerequisite.

[1] http://releases.llvm.org/download.html#3.9.0
I came across this, where they compiled FF with PGO GCC.
https://gcc.gnu.org/ml/gcc/2013-03/msg00210.html


"

Firefox:
vanilla:  5143.27s user 267.27s system 346% cpu 26:02.03 total
PGO    :  4590.37s user 270.21s system 344% cpu 23:28.89 total
LTO    :  5056.11s user 268.04s system 348% cpu 25:28.73 total
LTO+PGO:  4598.79s user 269.01s system 347% cpu 23:22.13 total

 * GCC build with PGO is ~10% faster than a vanilla bootstrapped compiler.
"

So, looks like a worthwhile optimisation.
Flags: needinfo?(erahm)
(In reply to mayankleoboy1 from comment #1)
> I came across this, where they compiled FF with PGO GCC.
> https://gcc.gnu.org/ml/gcc/2013-03/msg00210.html
> 
> 
> "
> 
> Firefox:
> vanilla:  5143.27s user 267.27s system 346% cpu 26:02.03 total
> PGO    :  4590.37s user 270.21s system 344% cpu 23:28.89 total
> LTO    :  5056.11s user 268.04s system 348% cpu 25:28.73 total
> LTO+PGO:  4598.79s user 269.01s system 347% cpu 23:22.13 total
> 
>  * GCC build with PGO is ~10% faster than a vanilla bootstrapped compiler.
> "
> 
> So, looks like a worthwhile optimisation.

That's great! Unfortunately I'm not the one to implement this, hopefully someone on the build team will take a look.
Flags: needinfo?(erahm)
Product: Core → Firefox Build System
FWIW, I got apparent improvements in build time by building my compiler from source, as well as building with PGO:
https://jdashg.github.io/misc/pgo-clang.html

I can back up a ~10% compile time advantage from PGO, though this was only a 5% total build time advantage for my -j32 machine.
Compiling with -O3 and -march=native was a much larger win, so I'm not sure distributing PGO binaries is the right solution.

I did not test -O3 without -march=native. This is worth following up on, but I expect performance of my system compiler (Arch Linux) package to be similar to a build without -march=native.
Actually in more controlled testing, I did find PGO to be the largest win by far. (12m25s->10m30s)
Summary: Provide PGO builds of clang/gcc → Provide PGO builds of clang
Assignee: nobody → dmajor

Separating out the mechanical/"boring" changes to make the next patch more clear. This patch adds the ability to build a fourth stage that for now doesn't do anything special.

I changed to using >= to make it more obvious that e.g. "here is what's going to happen for stage 2" -- the off-by-one was too hard on my brain.

Attachment #9137226 - Attachment is obsolete: true

Separating out the mechanical/"boring" changes to make the next patch more clear. This patch adds the ability to build a fourth stage that for now doesn't do anything special.

I changed to using >= to make it more obvious that e.g. "here is what's going to happen for stage 2" -- the off-by-one was too hard on my brain.

This adds the ability to do four-stage PGO builds. This was surprisingly straightforward thanks to PGO being a well-supported scenario in LLVM's cmake.

For reference, the stages are:
stage1: Initial build with gcc
stage2: Instrumented build using stage1
stage3: Train by using the instrumented stage2 to build the clang tree
stage4: Optimize using the stage3 compiler and the profdata created with it

Depends on D69079

Otherwise, PGO builds would fail to find asan at stage2 because the instrumented build uses LLVM_BUILD_RUNTIME=No.

Depends on D69080

This will partially atone for making builds longer with PGO.

Depends on D69618

Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a64593d4c645
build-clang: pass is_final_stage even for stage1. r=glandium
https://hg.mozilla.org/integration/autoland/rev/441282fd1fea
build-clang: avoid building unnecessary things in intermediate stages. r=glandium
https://hg.mozilla.org/integration/autoland/rev/8adfc59eb3c5
build-clang: Install imports and asan symbols only in the final stage r=glandium
https://hg.mozilla.org/integration/autoland/rev/96701b9815c2
build-clang: Merge LLVM a84b200e604 to fix Windows PGO. r=glandium
https://hg.mozilla.org/integration/autoland/rev/493c338ad705
build-clang: Add support for 4-stage builds r=glandium
https://hg.mozilla.org/integration/autoland/rev/e04c57ed0c04
build-clang: Add support for PGO builds. r=glandium
https://hg.mozilla.org/integration/autoland/rev/810a84a18948
build-clang: Convert 3-stage builds to 4-stage PGO builds. r=glandium
Depends on: 1628036
Flags: needinfo?(dmajor)
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/084f9e2cdb83
build-clang: pass is_final_stage even for stage1. r=glandium
https://hg.mozilla.org/integration/autoland/rev/f4874ffeac44
build-clang: avoid building unnecessary things in intermediate stages. r=glandium
https://hg.mozilla.org/integration/autoland/rev/4f67a5a67abf
build-clang: Install imports and asan symbols only in the final stage r=glandium
https://hg.mozilla.org/integration/autoland/rev/3129eeee4887
build-clang: Merge LLVM a84b200e604 to fix Windows PGO. r=glandium
https://hg.mozilla.org/integration/autoland/rev/21780faba92c
build-clang: Add support for 4-stage builds r=glandium
https://hg.mozilla.org/integration/autoland/rev/475d78d8da12
build-clang: Add support for PGO builds. r=glandium
https://hg.mozilla.org/integration/autoland/rev/f7432af8daf3
build-clang: Convert 3-stage builds to 4-stage PGO builds. r=glandium
Blocks: 1628479

== Change summary for alert #25582 (as of Thu, 09 Apr 2020 02:25:56 GMT) ==

Improvements:

9% build times linux64 debug plain taskcluster-c5.4xlarge 1,339.46 -> 1,213.62
9% build times linux64 debug plain taskcluster-m5.4xlarge 1,407.98 -> 1,277.07
9% build times linux64 debug plain taskcluster-c5d.4xlarge 1,316.49 -> 1,203.20
8% build times windows2012-64 debug plain taskcluster-c4.4xlarge 2,679.84 -> 2,466.41
7% build times windows2012-aarch64 opt aarch64-no-eme nightly taskcluster-c5.4xlarge 2,287.62 -> 2,132.88
6% build times windows2012-32-shippable opt nightly taskcluster-c5d.4xlarge 2,266.06 -> 2,131.84
6% build times windows2012-aarch64 opt aarch64-no-eme nightly taskcluster-m5.4xlarge 2,393.91 -> 2,255.14
5% build times android-5-0-aarch64 pgo taskcluster-m5.4xlarge 2,350.24 -> 2,234.64

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=25582

(In reply to Alexandru Ionescu :alexandrui (needinfo me) from comment #17)

== Change summary for alert #25582 (as of Thu, 09 Apr 2020 02:25:56 GMT) ==

Improvements:

9% build times linux64 debug plain taskcluster-c5.4xlarge 1,339.46 -> 1,213.62
9% build times linux64 debug plain taskcluster-m5.4xlarge 1,407.98 -> 1,277.07
9% build times linux64 debug plain taskcluster-c5d.4xlarge 1,316.49 -> 1,203.20
8% build times windows2012-64 debug plain taskcluster-c4.4xlarge 2,679.84 -> 2,466.41
7% build times windows2012-aarch64 opt aarch64-no-eme nightly taskcluster-c5.4xlarge 2,287.62 -> 2,132.88
6% build times windows2012-32-shippable opt nightly taskcluster-c5d.4xlarge 2,266.06 -> 2,131.84
6% build times windows2012-aarch64 opt aarch64-no-eme nightly taskcluster-m5.4xlarge 2,393.91 -> 2,255.14
5% build times android-5-0-aarch64 pgo taskcluster-m5.4xlarge 2,350.24 -> 2,234.64

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=25582

Congrats!

(it's reassuring that this matches our experiments!)

You need to log in before you can comment on or make changes to this bug.