Closed Bug 1641674 Opened 5 years ago Closed 4 years ago

Linker section GC is throwing away lots of PGO data

Categories

(Firefox Build System :: Toolchains, defect)

defect

Tracking

(firefox82 fixed)

RESOLVED FIXED
82 Branch
Tracking Status
firefox82 --- fixed

People

(Reporter: away, Assigned: away)

References

Details

Attachments

(2 files)

While investigating clang-10 performance regressions, I noticed that a large number of functions are missing from our merged.profdata, even though those functions are clearly instrumented in disassembly, and we definitely execute them during training.

After stepping through llvm-profdata show and deciphering the file format from InstrProfData.inc, I found that our missing functions are dumping their counter arrays to the .profraw files as expected. But the tooling does not iterate over those arrays directly. Instead, the master "table of contents" is the list of __llvm_profile_data objects at the beginning of the file, which have pointers to the counter arrays and other per-function information.

Those data objects were missing because their corresponding __profd_ entries in the compiled object files were dropped by linker GC. So we ended up with a bunch of data in our profiles that was unreachable from the ToC.

Disabling --gc-sections during MOZ_PROFILE_GENERATE results in huge improvements. Using the webaudio benchmark, which has tight C++ loops and is therefore very sensitive to compiler optimization, the previous clang-10 regressions turn into solid improvements.

I tried to run this under lld because I already have a debug build of that, but it turns out that the problem doesn't reproduce there. The excessive GC seems to be something specific to ld, that was made worse by clang-10.

Here's the line count of -Wl,--print-gc-sections output:

clang-9 + ld:    580200
clang-9 + gold:    5463
clang-9 + lld:     5467
clang-10 + ld:   857670
clang-10 + gold:   5460
clang-10 + lld:    5464

Given that this isn't a problem under lld, I'm much less inclined to debug it any further, and more inclined to just take the --no-gc-sections workaround.

Have you tried to get a small reproduction? That would be useful in order to file an upstream bug (and also to check if it's still an issue in newer versions of binutils, I'm not sure we're on the last release).

I did, using a stripped-down version of the GainNodeEngine code from webaudio that first caught our attention in perfherder. I tried to match the code structure as closely as possible in case linkage and visibility of the functions make a difference. No luck. At this point I can't justify spending more time on it.

Do you have a list of the GCed sections? (from -Wl,--print-gc-sections)

Attached file printgc10ld.tar.xz

For purposes of this bug, the removals that really hurt are the profd symbols.

So... I took a quick look and ... I can't even figure out why some of those profd symbols were not GC'ed before or why lld wouldn't GC them.
, because it looks to me they should be. Like, if I look at __profd__ZNK7mozilla3dom14GainNodeEngine19SizeOfExcludingThisEPFmPKvE.12884901887, I don't see it used anywhere that should prevent the linker from GC'ing it, and more importantly, I don't see what the difference is between the clang 9 and clang 10 output that makes ld GC it now but not previously.

Assignee: nobody → dmajor

For not-well-understood reasons, ld's --gc-sections discards a large number of the PGO bookkeeping structures that enable us to keep track of function counters, and the effect gets worse in object files generated by clang-10.

As much as I'd like to understand this better, the investigations take way too much time. As a path of least resistance, we can disable --gc-sections for the instrumentation phase of PGO builds. It won't harm anything since users never see those builds, and it will improve the performance of the optimized phase greatly.

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:dmajor, could you have a look please?
For more information, please visit auto_nag documentation.

Flags: needinfo?(dmajor)
Flags: needinfo?(dmajor)
Pushed by dmajor@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9bf1c5e44a13 Don't use --gc-sections during profile generation r=froydnj
Pushed by rmaries@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0ba092f930cc Don't use --gc-sections during profile generation r=froydnj
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 82 Branch
Status: RESOLVED → REOPENED
Flags: needinfo?(dmajor)
Resolution: FIXED → ---
Target Milestone: 82 Branch → ---
Pushed by dmajor@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/dc0943c6e393 Don't use --gc-sections during profile generation r=froydnj
Flags: needinfo?(dmajor)
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 82 Branch
Regressions: 1663138
Regressions: 1663139
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: