Closed Bug 1437452 Opened 2 years ago Closed 4 months ago

Consider doing PGO on rust code

Categories

(Firefox Build System :: General, enhancement)

enhancement
Not set

Tracking

(firefox70 fixed)

RESOLVED FIXED
mozilla70
Tracking Status
firefox70 --- fixed

People

(Reporter: emilio, Assigned: mwoerister)

References

(Blocks 4 open bugs)

Details

(Whiteboard: [stylo:p3])

Attachments

(1 file)

I don't know how much effort would this be, but this would be great to avoid manually optimizing hot stuff that the old style system gets for free due to PGO, like https://github.com/servo/servo/pull/19781, for example.
Blocks: 1422522
I don't think there's a supported way to do Rust PGO currently, is there?
Yeah, I guess you're right. I thought I had read on the matter, but https://github.com/rust-lang/rfcs/issues/1220 is still open, so I might have just dreamed about it.

There's https://github.com/Geal/pgo-rust, but it looks a somewhat tedious process to integrate.
(In reply to Emilio Cobos Álvarez [:emilio] from comment #2)
> There's https://github.com/Geal/pgo-rust, but it looks a somewhat tedious
> process to integrate.

That's probably sort of straightforward, considering that we already have all the LLVM tools installed for PGO builds because bindgen.
(In reply to Nathan Froyd [:froydnj] from comment #3)
> (In reply to Emilio Cobos Álvarez [:emilio] from comment #2)
> > There's https://github.com/Geal/pgo-rust, but it looks a somewhat tedious
> > process to integrate.
> 
> That's probably sort of straightforward, considering that we already have
> all the LLVM tools installed for PGO builds because bindgen.

Ah, interesting! Do you have an example of how to do that? It seems we need to link all libraries manually with clang, which looks somewhat unfortunate.

Which kind of tooling we use for C++? Maybe I can write a cargo script that automates it to some extent.
Flags: needinfo?(nfroyd)
(In reply to Emilio Cobos Álvarez [:emilio] from comment #4)
> (In reply to Nathan Froyd [:froydnj] from comment #3)
> > (In reply to Emilio Cobos Álvarez [:emilio] from comment #2)
> > > There's https://github.com/Geal/pgo-rust, but it looks a somewhat tedious
> > > process to integrate.
> > 
> > That's probably sort of straightforward, considering that we already have
> > all the LLVM tools installed for PGO builds because bindgen.
> 
> Ah, interesting! Do you have an example of how to do that? It seems we need
> to link all libraries manually with clang, which looks somewhat unfortunate.

With "how to do that" I mean how to integrate on FF's build system. Pointers to where we do this for C++ would be nice.
(In reply to Emilio Cobos Álvarez [:emilio] from comment #4)
> Which kind of tooling we use for C++? Maybe I can write a cargo script that
> automates it to some extent.

We use our C++ compilers' native facilities: -LTCG:PGINSTRUMENT / -LTCG:PGOPTIMIZE in MSVC, and -fprofile-generate / -fprofile-use in GCC (we don't currently do any builds with clang and PGO).
We do a two-pass build for PGO, setting MOZ_PROFILE_GENERATE for the first pass and MOZ_PROFILE_USE for the second, so you can see most of the interesting bits by searching for those:
https://dxr.mozilla.org/mozilla-central/search?tree=mozilla-central&q=MOZ_PROFILE_GENERATE
https://dxr.mozilla.org/mozilla-central/search?q=MOZ_PROFILE_USE&redirect=true
Thanks Ted!

IIUC this is the relevant bit for C++: https://searchfox.org/mozilla-central/rev/b9f1a4ecba48b2d8c686669e32d109c40e927b48/build/moz.configure/toolchain.configure#1232

Doing it for rust would be sort of annoying unless we allow rustc to specify the profile flags, I think, because you need to track the llvm bytecode files manually and link with clang... I'll try to reach out to see if there's a good way to support that in rustc itself.
Flags: needinfo?(nfroyd)
Hmm,

RUSTFLAGS="-Cllvm-args=-profile-generate"

Seems to do something, even though it fails to link with undefined reference to `__llvm_profile_instrument_target'
Trying to make this work without toolchain support sounds like it's going to be finicky and fragile, honestly.

n.b., we do have some build system code that does some massaging of profile data in between the passes, because we run the browser from `dist/firefox` (which is where we stage the contents of the package that gets shipped), so MSVC's instrumentation code writes the profiling data there, but the linker expects to find it next to the original binaries:
https://dxr.mozilla.org/mozilla-central/rev/9a0655ea8ae02f4d96bf23a607a94641f1c57f1b/config/rules.mk#486

It's possible you could do something similar for Rust code to smooth over some rough edges.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #10)
> Trying to make this work without toolchain support sounds like it's going to
> be finicky and fragile, honestly.

I agree. I talked a bit with Alex over #rustc. I noticed that rustc already supports pulling in the correct runtime bits via -Z profile. that means that a binary generated with:

  RUSTFLAGS="-Cllvm-args=-profile-generate" rustc -g -Z profile

Does link correctly. It does _not_ generate the right profiling data though (I suspect -Z profile does force the gcov-style profiling data and clobbers the PGO stuff).

Anyway, I'm looking into adding the right options to rustc right now.
Whiteboard: [stylo:p3]
I looked into this this weekend:

  https://github.com/rust-lang/rust/pull/48346

Should allow us to generate profiles and use them the same way as clang, I'd think.
Assignee: nobody → emilio
Product: Core → Firefox Build System
This should be now pretty similar to bug 1464170, just setting RUSTFLAGS instead.
See Also: → 1464170
In https://treeherder.mozilla.org/#/jobs?repo=try&revision=86a50eba1de6dec84ccef88b359640250ec63d6f, I tried passing `-Z pgo-gen=...` during the instrumentation phase, but it failed with:
INFO -  error: failed to load bc of "rsdparsa_capi0-21966efb184e55d21cbcc3262c281707.rs"

That message is apparently https://dxr.mozilla.org/rust/rev/ccb550fb455eac46098b352875ba387eb982919d/src/librustc_codegen_llvm/back/lto.rs#270 but I have absolutely no idea how to read that code.
(In reply to David Major [:dmajor] from comment #14)
> In
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=86a50eba1de6dec84ccef88b359640250ec63d6f, I tried
> passing `-Z pgo-gen=...` during the instrumentation phase, but it failed
> with:
> INFO -  error: failed to load bc of
> "rsdparsa_capi0-21966efb184e55d21cbcc3262c281707.rs"
> 
> That message is apparently
> https://dxr.mozilla.org/rust/rev/ccb550fb455eac46098b352875ba387eb982919d/
> src/librustc_codegen_llvm/back/lto.rs#270 but I have absolutely no idea how
> to read that code.

I'd try with -C lto=thin to see if it makes a difference... Michael, have you seen something like this before? Any chance you can guess what's going on?

Also, I assume this is Windows, right?
(In reply to Emilio Cobos Álvarez (:emilio) from comment #15)
> I'd try with -C lto=thin to see if it makes a difference... 

15:03:16     INFO -  error: lto can only be run for executables, cdylibs and static library outputs

> Also, I assume this is Windows, right?

Right.
> 15:03:16     INFO -  error: lto can only be run for executables, cdylibs and static library outputs

That was when it was building:

'z:/build/build/src/rustc/bin/rustc.exe' --crate-name geckoservo 'servo\ports\geckolib\lib.rs' --crate-type staticlib --crate-type rlib
Oh, so it got past comment 14...
Well, the error message from comment 14 comes from a fat-lto-only codepath, so setting lto=thin does avoid it. But I don't know which error is worse.
Michael, is this something that you might be able to help with?
Flags: needinfo?(mwoerister)
Yes, I'll take a look!
Flags: needinfo?(mwoerister)
It looks like the geckoservo crate is built as a staticlib and an rlib at the same time (https://searchfox.org/mozilla-central/rev/2466b82b729765fb0a3ab62f812c1a96a7362478/servo/ports/geckolib/Cargo.toml#10). While this is in theory supported, it is bound to cause problems. From what I could gather, the staticlib version is not needed anyway. I recommend just building the crate as an rlib. 

David, did you maybe add -Clto=thin to RUSTFLAGS instead of cargo_rustc_flags in rules.mk? That would lead to the flag being passed to *every* rustc invocation, but we only want it to be passed to the final invocation for gkrust.
(In reply to Michael Woerister from comment #22)
> David, did you maybe add -Clto=thin to RUSTFLAGS instead of
> cargo_rustc_flags in rules.mk? That would lead to the flag being passed to
> *every* rustc invocation, but we only want it to be passed to the final
> invocation for gkrust.

Oh, I didn't know that, thanks for the pointer! I'll try the latter when I get a chance.
Latest issue: My build is not producing .profraw files during training. I had:

+ifdef MOZ_PROFILE_GENERATE
+cargo_rustc_flags  += -Z pgo-gen=$(topobjdir)/rust.profraw
+endif

But I don't see the string `rust.profraw` anywhere in xul.dll.

Alex, I wonder if this change is related? Does Windows not support pgo-gen or something? https://github.com/rust-lang/rust/commit/42eb85002ae4bda9899216805d03fa7d77279ede#diff-325a9450547647e0cb5f4f1f54224af8
Flags: needinfo?(acrichton)
Presumably something added changed how libprofiler_rt worked on windows... We need to do some linker magic so LTO doesn't eat the strings:

  https://github.com/rust-lang/rust/pull/48346/files#diff-44a9bc4d6f529c8d0b40b42830e04025R230

Does using:

  LLVM_PROFILE_FILE=rust.profraw

work? If not, do the profiler_rt symbols appear? See:

  https://github.com/rust-lang/rust/pull/48346/files#diff-5aefd510b19aac3d1225bb27fbf7aac1R285

For what I had to do on Linux due to an optimization that clang introduced.
I guess if the LLVM_PROFILE_FILE string appears on the binary, then LTO is removing the weak symbols. Otherwise the profiler runtime isn't there, which would be somewhat weird.
Another option is that it's conflicting with the clang PGO options, and that since it uses the same weak symbol, the profiled data ends up in the same profile as the clang one?

I'd expect you to get a lot of warnings if you try -Z pgo-use the clang file and the profile data for the rust code isn't there.
Hmm, I do see LLVM_PROFILE_FILE in the binary, and setting it to foo.profraw does create such a file, but I _think_ it's all from the C++ PGO. I don't see any function names that are obviously rusty.

Maybe I should disable C++ PGO to remove any potential distractions while we sort this out. Unfortunately that means this will have to wait until next week.
Ah unfortunately I wasn't actually sure what went wrong during the LLVM 7 upgrade which needed PGO for windows in that test to be turned off. I'm not too familiar with PGO (or it on Windows). If a fix is found though which we need on our end it should be easy to apply!
Flags: needinfo?(acrichton)
With C++ PGO disabled, I don't see "profraw" or "LLVM_PROFILE_FILE" anywhere in xul.dll. I can only conclude that `-Z pgo-gen` isn't taking effect.

From what I've been able to gather over the past few days it looks like the PGO method that FF currently uses (front-end based PGO) and the method that rustc supports (IR-level PGO) might not be compatible. They use the same runtime infrastructure from clang_rt.profile.lib but there are subtle differences and Clang's documentation seems to imply that either one or the other should be used.

I think it would be a good idea to test whether we can switch to IR-level PGO for C++ code too, that is, use -fprofile-generate instead of -fprofile-instr-generate. Supporting front-end based instrumentation in rustc would potentially be a lot of work and in theory IR-level instrumentation should be more flexible anyway.
(I'm also a bit surprised that the current setup in FF works at all because, from what I've seen so far, it seems like profiling data in default.profraw might be overwritten multiple times instead of being appended to).

So it would be good to benchmark the following combinations against each other:

  1. The current setup with -fprofile-instr-generate.
  2. An adapted version with -fprofile-instr-generate="default-%m.profraw" with data merging explicitly enabled.
  3. A version with -fprofile-generate (i.e. IR-level profiling)
  4. A version with -fprofile-generate and -mllvm -disable-preinline (because Emilio found the pre-inlining pass to be detrimental sometimes)

Some background:

  • LLVM's profiling instrumentation works via adding certain instrinsics to LLVM IR
  • Both front-end and IR-level instrumentation use the same instrinsics and runtime but for the former Clang emits the instrinsics while for the latter they are emitted by an internal LLVM pass during optimization
  • IR-level instrumentation can lead to instrumented binaries that run with less overhead because it can run an inlining pass before adding the intrinsics (see http://lists.llvm.org/pipermail/llvm-dev/2015-August/089044.html)

(In reply to Michael Woerister from comment #31)

I think it would be a good idea to test whether we can switch to IR-level
PGO for C++ code too, that is, use -fprofile-generate instead of
-fprofile-instr-generate.

Last I tried, Windows clang-cl only supports the "-instr" variety. I haven't looked closely enough to see whether this is just a lack of flag plumbing, or something deeper.

Perhaps you could start out trying your experiments on Linux, to see whether switching to -fprofile-gen is an improvement there. If so, we can cross the Windows bridge at that point.

Another thing to consider is that the recently-landed context-sensitive PGO (aka "5 tier") only supports -fprofile-gen currently. Again I haven't looked closely enough to see whether it's just flag plumbing or something more architectural. But if you can switch us over to its profiling style, all the better. :-)

Last I tried, Windows clang-cl only supports the "-instr" variety.

Looks like your right :)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=471e6757d94adebc9817b101a04682f4c368d1ae&selectedJob=234137475

I haven't looked closely enough to see whether this is just a lack of flag plumbing

I suspect it is. But I've been wrong before.

Another thing to consider is that the recently-landed context-sensitive PGO (aka "5 tier")

Oh that looks interesting. I noticed the -fcs-profile-generate flag showing up in the documentation recently.

I just remembered that https://reviews.llvm.org/D53457 exists, and clang-cl seems to work with -clang:-fprofile-generate!

I tried it on a very simple program and I got a file named "default_[huge-number]_0.profraw" which suggests that we are indeed taking the IR-level profiling paths.

Well that didn't last long. clang-cl crashed on process_thread_interception.cc during the instrumentation phase, likely due to that file's use of SEH. I'll attempt to reduce it...

I just remembered that https://reviews.llvm.org/D53457 exists, and clang-cl seems to work with -clang:-fprofile-generate!

Ah, that's good so we are not blocked an that.

Well that didn't last long. clang-cl crashed on process_thread_interception.cc during the instrumentation phase, likely due to that file's use of SEH. I'll attempt to reduce it...

It looks like FF is being built with -C panic=abort, so this shouldn't be a problem for the Rust part of the code at least.

Except libstd is not built with -C panic=abort...

For code coverage, we worked around the problem by disabling instrumentation in functions using SEH.

Except libstd is not built with -C panic=abort...

I wonder if the fat-LTO step we are already doing helps here: https://github.com/rust-lang/rust/blob/f47ec2ad5b6887b3d400aee49e2294bd27733d18/src/rustllvm/PassWrapper.cpp#L709

This pass is run after libstd has been merged in.

Hm, it seems that the maybe_clobber_profiledbuild step does not delete Rust artefacts. And surprisingly cargo doesn't pick up on the changed RUSTFLAGS either.

I filed https://bugs.llvm.org/show_bug.cgi?id=41279 for the SEH crash, but it turns out that it doesn't really matter to us. We need to disable PGO for the sandbox interceptors anyway, because they can be called before the child process has initialized its import table, so the instrumentation code crashes.

Depends on: 1541068
Depends on: 1542746

I tried reproducing Michael's results (with an older base tree, so maybe an older Rust?) and got:

error: /builds/worker/workspace/build/src/obj-firefox/dom/serviceworkers/test/gtest/Unified_cpp_test_gtest0.cpp: Function control flow change detected (hash mismatch) main [-Werror,-Wbackend-plugin]

on Linux builds. I mean, adding PGO instrumentation to our gtest code seems pretty pointless, but I don't know why the error appears on my builds and not Michael's.

The good news is that disabling pgo for some sandbox files and using the /clang trick dmajor mentioned in comment 34 works to get us through the profile-generate phase. The bad news is that Windows builds now fall over with a similar error as comment 42:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a546ec8b70d13331c4720dbfa0a83d4cf95e6c3e

error: z:/task_1554908694/build/src/obj-firefox/xpcom/base/Unified_cpp_xpcom_base1.cpp: Function control flow change detected (hash mismatch) printf_stderr [-Werror,-Wbackend-plugin]

That was based off of m-c as of an hour or so ago, so we should be using the latest clang 8. No results yet from the Linux builds.

I am not sure what is going on. Maybe we need to disable -Wbackend-plugin on pgo builds, just like we already disable profile-instr-*?

OK, so adding -Wno-error=backend-plugin to PGO use flags seems to have silenced the errors. It's still not clear to me why I was seeing these and Michael was not; maybe will have to dig into compiler versions and/or exact compiler patches.

Removing bits about setting MOZ_PROFILE_ORDER_FILE gives us a successful Windows build, too! So that takes care of knowing that we can switch to IR-level instrumentation; now to integrating all the necessary Rust bits on all platforms.

This might also be interesting: The following build was a test for Windows (which failed as expected) and macOS (which succeeded):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=086c9004d2d86a549579c10c3b4978f84d5511bf

The interesting point about it is that it used Rust 1.34-beta instead of nightly, so that already is compatible with Clang 8.

Huh, I just realized that the macOS build should have failed because Rust beta doesn't accept unstable flags like -Zpgo-gen. But it looks like there is no PGO done at all for shippable macOS builds. Is that correct?

Assignee: emilio → mwoerister

(In reply to Michael Woerister from comment #47)

Huh, I just realized that the macOS build should have failed because Rust beta doesn't accept unstable flags like -Zpgo-gen. But it looks like there is no PGO done at all for shippable macOS builds. Is that correct?

That's correct, we're working on macOS PGO support in bug 1528374 -- it needs to be 3-stage due to cross-compiling on Linux, so it's a bit more involved.

Do you know if there are plans to stabilize -Zpgo-gen? I found a rust-lang RFC but it hasn't been updated in a while.

Flags: needinfo?(mwoerister)
Blocks: 1543852
Blocks: 1543853
Blocks: 1543854

I just realized that I never posted the link to the PGO stabilization tracking issue, so here goes: https://github.com/rust-lang/rust/issues/59913

Flags: needinfo?(mwoerister)

Current Status

All of our identified blockers are now fixed. I've done try runs (linux, win) using nightly rust with somewhat unimpressive results (ignore the mac numbers, we don't pgo mac), which leads me to believe we're not actually consuming the rust profile data properly.

Next Steps

Firefox

  1. Verify that rust profiling data is generated
  2. Verify that rust profiling data is consumed
  3. Put together proper patches to enable Rust PGO (example of what I tried)

Rust

  1. PGO needs to be stabilized so that we can use the PGO flag (our policy doesn't allow us to use nightly rust builds right now). The compiler team has approved the change and it's just pending landing.
  2. An overly aggressive compiler error needs to be fixed
  3. We need to patch various libraries that now Werror on Rust 1.37 due to new warnings.

Release Planning

We currently have a policy that requires that end-user builds of Firefox use release builds of Rust -- we're allowed to use beta rust on one platform. Additionally Rust has a requirement that unstable flags (pgo in this case) can only be used in Nightly Rust. This means that we have to wait for a stabilized PGO flag to hit Rust Beta before we can test this on our Nightly users.

Rust 1.37 will hit beta on July 4th, 2019, release on August 15th. Our next Firefox 69 will move to beta on July 8th, release on September 2nd. Firefox 70 hits beta on September 2nd, release on October 21st. My proposal is:

  1. July 8th - Update our Windows nighly builds to use Rust 1.37 beta. If we haven't had a chance to fix all new Rust warnings we can disable warnings as errors for shippable windows builds as a temporary workaround.
  2. August 15th - Update all builds to use Rust 1.37 release.

We can either let this ride the trains on 70 or push for an uplift to 69 beta.

which leads me to believe we're not actually consuming the rust profile data properly.

FWIW, the profile data does contain information for rust code, and the profile-use build does pass the right flag with the right path to the merged profile.

which leads me to believe we're not actually consuming the rust profile data properly.

A couple ideas to check that - Disassemble some panic codepaths before and after, since PGO ought to have high confidence about them being cold; or do a SymbolSort diff to look for dramatic changes in function size due to different inlining decisions.

A couple ideas to check that - Disassemble some panic codepaths before and
after, since PGO ought to have high confidence about them being cold; or do
a SymbolSort diff to look for dramatic changes in function size due to
different inlining decisions.

I didn't do this fully rigorously, but informally: in a before-and-after view of your try pushes, there is a lot of shuffling-about of Rust code, and servo glue functions did generally get bigger, for example by inlining xul!alloc::string::{{impl}}::to_string<nsstring::nsAString> in several places.

It seems safe to say that your flags weren't just ignored (which agrees with glandium's Comment 51), though I couldn't say whether or not PGO is really achieving its full potential.

Come to think of it, is any Rust code besides stylo even exercised in our PGO training?

(In reply to David Major [:dmajor] from comment #53)

A couple ideas to check that - Disassemble some panic codepaths before and
after, since PGO ought to have high confidence about them being cold; or do
a SymbolSort diff to look for dramatic changes in function size due to
different inlining decisions.

I didn't do this fully rigorously, but informally: in a before-and-after view of your try pushes, there is a lot of shuffling-about of Rust code, and servo glue functions did generally get bigger, for example by inlining xul!alloc::string::{{impl}}::to_string<nsstring::nsAString> in several places.

Thanks for checking, this is really helpful!

It seems safe to say that your flags weren't just ignored (which agrees with glandium's Comment 51), though I couldn't say whether or not PGO is really achieving its full potential.

I feels like we need to get someone from the rust compiler team involved unless you have other thoughts on how to verify rust PGO is behaving as expected.

Come to think of it, is any Rust code besides stylo even exercised in our PGO training?

I filed bug 1543852, bug 1543853, and bug 1543854 to increase our training set. We could possibly do a local profiling run that ad hoc loads some of those to see if we get better numbers if we just want to see if they help before doing the full integration with automation.

Blocks: 1560972

This change is similar to what we do for Rust LTO, except that we
don't have a MOZ_PGO=cross or similar setting. We assume that if you
want to do PGO, you also want to PGO the Rust code.

Depends on: 1564638
Depends on: 1565757

Nathan, is it possible to do a linux-only version of your patch that we could land alongside updating our linux builds to Rust 1.37 beta?

Flags: needinfo?(nfroyd)

(In reply to Eric Rahm [:erahm] from comment #56)

Nathan, is it possible to do a linux-only version of your patch that we could land alongside updating our linux builds to Rust 1.37 beta?

Sure, we can limit the addition of the flags to Linux platforms only. Would that be more desirable for the time being?

Flags: needinfo?(nfroyd)

(In reply to Nathan Froyd [:froydnj] from comment #57)

(In reply to Eric Rahm [:erahm] from comment #56)

Nathan, is it possible to do a linux-only version of your patch that we could land alongside updating our linux builds to Rust 1.37 beta?

Sure, we can limit the addition of the flags to Linux platforms only. Would that be more desirable for the time being?

Yeah, perhaps as another blocker to this bug dependent on bug 1564638. I'd like to get rust PGO enabled and tested on at least one platform, once we get Linux builds on the 1.37 beta we'll be able to use the stabilized PGO flag, but passing that flag to other platforms would break.

Attachment #9075681 - Attachment description: Bug 1437452 - turn on PGO options for Rust code; r=#build → Bug 1437452 - add makefile bits for cross-language PGO; r=#build
Depends on: 1569709
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d3bc5e363369
add makefile bits for cross-language PGO; r=glandium
Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70
Depends on: 1571836
Depends on: 1574275
Blocks: 1602568
You need to log in before you can comment on or make changes to this bug.