Closed Bug 1464170 Opened 2 years ago Closed 2 years ago

Support PGO in clang builds on Linux

Categories

(Firefox Build System :: General, enhancement)

3 Branch
enhancement
Not set

Tracking

(firefox63 fixed)

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: dmajor, Assigned: glandium)

References

Details

Attachments

(1 file, 1 obsolete file)

I got PGO working locally with clang on Linux. My WIP patch is quite horrible and needs cleanup, but I need to turn my attention elsewhere so I'm going to post what I have.
Comment on attachment 8980384 [details] [diff] [review]
WIP stopping point - not actively working on this

Review of attachment 8980384 [details] [diff] [review]:
-----------------------------------------------------------------

::: Makefile.in
@@ +311,4 @@
>  # No point in clobbering if PGO has been explicitly disabled.
>  ifndef NO_PROFILE_GUIDED_OPTIMIZE
>  maybe_clobber_profiledbuild: clean
> +	/home/vm/src/llvm.out/bin/llvm-profdata merge -o $(DEPTH)/merged.profdata `ls -S $(DEPTH)/*.profraw | head -n1`

The path to llvm-profdata should be made a variable that can be set from mozconfig, like LLVM_SYMBOLIZER.

Regarding the `ls -S $(DEPTH)/*.profraw | head -n1`: After the training run I got a bunch of .profraw files in my objdir with random numbers or hashes in the filename. One of them is orders of magnitude larger than the others. I assume it's the profile data for the main Firefox process (or libxul? I don't know if it's split by process or library.) So I figured I'd sort by size and take the first. Maybe we should let llvm-profdata merge all of the files instead? I don't know whether adding extra ones is harmful.

::: build/moz.configure/toolchain.configure
@@ +1237,5 @@
> +        return namespace(
> +            gen_cflags=['-fprofile-generate'],
> +            gen_ldflags=['-fprofile-generate'],
> +            use_cflags=['-fprofile-use=/home/vm/src/mc/obj/clang7pgo/merged.profdata',
> +                        '-Wcoverage-mismatch', '-Wno-error=coverage-mismatch'],

This inherited -Wcoverage-mismatch from the gcc block above, but it turned into -Werror. Maybe we should investigate the mismatch, I don't know if it's hurting us.

Also, I did not include -fprofile-correction from the gcc block, because clang warns that it doesn't recognize this switch.

We should probably use the LLVM_PROFILE_FILE env var rather than the '=' form of -fprofile-use. But it didn't work for me at first try. (I was probably confusing make variables and environment variables.)
Attachment #8980384 - Attachment description: WIP - not ready to land → WIP stopping point - not actively working on this
Blocks: 1464193
(In reply to David Major [:dmajor] from comment #2)
> Regarding the `ls -S $(DEPTH)/*.profraw | head -n1`: After the training run
> I got a bunch of .profraw files in my objdir with random numbers or hashes
> in the filename. One of them is orders of magnitude larger than the others.
> I assume it's the profile data for the main Firefox process (or libxul? I
> don't know if it's split by process or library.) So I figured I'd sort by
> size and take the first. Maybe we should let llvm-profdata merge all of the
> files instead? I don't know whether adding extra ones is harmful.

Reading https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation it looks like the default behavior for `-fprofile-generate` is to include unique IDs in each profraw file, where `-fprofile-instr-generate` does no. Either way it looks like you can set `LLVM_PROFILE_FILE` to explicitly override this, It supports putting a `%p` in the filename so that output from different pids doesn't clobber. The docs seem to indicate that the unique IDs you're seeing are per-shared-library, which is probably a PITA. GCC outputs coverage data next to each object file, and MSVC outputs files next to the binary. I'm not sure how you're supposed to map each profraw file back to the binary/shared library it came from with clang?
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #3)
> I'm not sure how you're supposed to map each profraw
> file back to the binary/shared library it came from with clang?

Eh, we can probably get 99% of the way there by just taking the largest one (libxul).
I don't have the exact numbers in front of me but talos results generally improved about 5% compared to clang non-PGO.
Keywords: in-triage
Keywords: in-triage
See Also: → 1437452
Maybe we can make some suggestions to upstream LLVM about how to make this easier to work with.
Assignee: nobody → mh+mozilla
Blocks: 1481721
Depends on: 1481719
This uses the same code path as for clang-cl builds.
Attachment #8980384 - Attachment is obsolete: true
Attachment #8998758 - Flags: review?(core-build-config-reviews)
Comment on attachment 8998758 [details] [diff] [review]
Support PGO in clang builds on Linux

\o/
Attachment #8998758 - Flags: review?(core-build-config-reviews) → review+
I'm surprised (but pleased) that you didn't run into https://bugs.llvm.org/show_bug.cgi?id=37422.
(In reply to David Major [:dmajor] from comment #9)
> I'm surprised (but pleased) that you didn't run into
> https://bugs.llvm.org/show_bug.cgi?id=37422.

I did.
Actually, I had a different issue.
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3c4564c2f2d5
Support PGO in clang builds on Linux. r=dmajor
https://hg.mozilla.org/mozilla-central/rev/3c4564c2f2d5
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Version: Version 3 → 3 Branch
You need to log in before you can comment on or make changes to this bug.