Closed Bug 1486042 Opened 6 years ago Closed 5 years ago

Perform ThinLTO across C++/Rust language boundaries

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox70 fixed)

RESOLVED FIXED
mozilla70
Tracking Status
firefox70 --- fixed

People

(Reporter: mwoerister, Assigned: froydnj)

References

Details

Attachments

(2 files, 2 obsolete files)

Quite a bit of work has gone into making the Rust compiler support linker-plugin based LTO [1]. At the moment, this seems to work when building Firefox locally on Windows with Clang and a nightly version of Rust -- that is, I can confirm instances of Rust functions being inlined into C++ call sites.

However, in a try build David Major and I have done, no inlining seems to have happened (although we should confirm that WinDbg wasn't being too smart here somehow and used debuginfo to show an inlined call like a regular one). 

The builds in question are:
1. A baseline build for comparison. It uses thin (instead of full) LTO for the Rust code [2]
2. A build that adds cross-language LTO in addition to the baseline [3]

The next steps are:
- Confirm that there's actually something wrong with the try-build above, i.e. that no inlining is happening although it should.
- Find out what the problem is.
- Do proper benchmarking on Talos in order to find out if cross-language LTO is actually a win (I could not determine this conclusively with my local builds).

[1] https://github.com/rust-lang/rust/issues/49879
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=47311259165639022843e847ca5ceaec010f76e9
[3] https://treeherder.mozilla.org/#/jobs?repo=try&revision=c5cf2d38fb922f7d3a6447cb97b5a10db1acb9f6
> 1. A baseline build for comparison. It uses thin (instead of full) LTO for the Rust code [2]
The baseline build uses full LTO. Thin LTO doesn't work without cross-lang, for reasons I don't remember. Maybe it was the DLL import stuff?
Oh right, I remember now that you told me that. Thanks for the reminder, David!
I took another look at the c5cf2d38fb922f7d3a6447cb97b5a10db1acb9f6 build above and it looks like cross-language inlining is happening for the "Windows 2012 x64 opt" build but not for the "Windows 2012 x64 pgo" build:

OPT:
> nsSMILCSSProperty::IsPropertyAnimatable:
>   00000001829FA860: 56                 push        rsi
>   00000001829FA861: 48 83 EC 20        sub         rsp,20h
>   00000001829FA865: 89 CE              mov         esi,ecx
>   00000001829FA867: E8 24 74 DF 01     call         _ZN5style10properties19animated_properties29nscsspropertyid_is_animatable17ha3d2a6ab393e9968E
>   00000001829FA86C: 84 C0              test        al,al

PGO:
> nsSMILCSSProperty::IsPropertyAnimatable:
>  00000001827738E0: 56                 push        rsi
>  00000001827738E1: 48 83 EC 20        sub         rsp,20h
>  00000001827738E5: 89 CE              mov         esi,ecx
>  00000001827738E7: E8 14 7D 78 01     call        Servo_Property_IsAnimatable
>  00000001827738EC: 84 C0              test        al,al
>  00000001827738EE: 74 1D              je          000000018277390D

So it's likely not something on CI that's causing the problem but rather the interaction between PGO and cross-lang LTO.
Here's an update on this:

- The latest tests where done with a rather recent Clang and rustc (both end of Oct 2018). Both are based on LLVM 8.0svn which is required in order for them to be compatible.
- xLTO (which is what I'll call cross-language LTO from now on) seems to work just fine in conjunction with PGO builds now. 
- I've been able to solve the problems around combining full LTO (for all Rust code) and ThinLTO (for everything, C++ and Rust together). This means that we now have comparable configurations for evaluating performance.

One of the goals of xLTO is that it hopefully allows for not duplicating logic in both Rust and C++ because either version can be called from the other without negative performance implications. With this in mind, and with help from Emilio, I've removed some code duplication in the gecko wrapper in Stylo (see https://hg.mozilla.org/try/rev/33746f477ee4). 

The following try-builds were benchmarked:

1. [xLTO off, manual inlining off]
   https://treeherder.mozilla.org/#/jobs?repo=try&revision=0f6e7cb45dbe0934e43263058d52ae15988441c9
2. [xLTO off, manual inlining ON]
   https://treeherder.mozilla.org/#/jobs?repo=try&revision=fa86f94963255a7bff3bc6fef741736b16ab1b92
3. [xLTO ON, manual inlining off]
   https://treeherder.mozilla.org/#/jobs?repo=try&revision=ca457e1b9ef3e473bc9f23de5a2e0dacb7dc8364
4. [xLTO ON, manual inlining ON]
   https://treeherder.mozilla.org/#/jobs?repo=try&revision=b0dba7a6cee4d2c3651341bd6af5d51576d1d953

I think some interesting comparisons are:

1. The effect of manual inlining (without xLTO) - This is interesting because it should give an idea what kind of performance difference manual inlining gives in general. It seems to have mostly positive but also some negative effects.
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=0f6e7cb45dbe0934e43263058d52ae15988441c9&newProject=try&newRevision=fa86f94963255a7bff3bc6fef741736b16ab1b92&framework=1&showOnlyConfident=1 

2. The effect of xLTO (without manual inlining) - This is interesting because it shows what kind of improvements xLTO can bring in general. It looks like it slightly improves performance in many cases, while a few regress slightly. However, many of the changes could also just be noise.
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=0f6e7cb45dbe0934e43263058d52ae15988441c9&newProject=try&newRevision=ca457e1b9ef3e473bc9f23de5a2e0dacb7dc8364&framework=1&showOnlyConfident=1

3. Finally: How does a build with xLTO and no manual inlining compare to the current state (no xLTO, code duplication) - This is the most likely configuration to replace the current one (i.e. replace manual work with a compiler optimizations). It looks like only the "tabpaint-from-parent pgo e10s stylo" benchmark is affected negatively, while some others show slight improvements.
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=fa86f94963255a7bff3bc6fef741736b16ab1b92&newProject=try&newRevision=ca457e1b9ef3e473bc9f23de5a2e0dacb7dc8364&framework=1&showOnlyConfident=1

So, judging from the numbers from (3), using xLTO might be a viable option to reduce the maintenance burden of performance sensitive code crossing the Rust/C++ boundary. It would be interesting though to take a closer look at that one regression. It looks like an important one?
(In reply to Michael Woerister from comment #4)
> 3. Finally: How does a build with xLTO and no manual inlining compare to the
> current state (no xLTO, code duplication) - This is the most likely
> configuration to replace the current one (i.e. replace manual work with a
> compiler optimizations). It looks like only the "tabpaint-from-parent pgo
> e10s stylo" benchmark is affected negatively, while some others show slight
> improvements.

Mike, when looking at the effects of cross-language LTO we're seeing a regression in just tabpaint-parent [1]. Do you have any thoughts on how to look into this further and whether or not that would be an acceptable regression?

[1] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=fa86f94963255a7bff3bc6fef741736b16ab1b92&newProject=try&newRevision=ca457e1b9ef3e473bc9f23de5a2e0dacb7dc8364&framework=1&showOnlyConfident=1
Flags: needinfo?(mconley)
So, just to re-synthesize, the change being proposed is link-time optimization, and when you compare against mozilla-central without link-time optimization, you're seeing a regression in Windows 10 tabpaint regression but only on PGO.

I'm afraid this has all of the hallmarks of something that I'm not really equipped to diagnose. :/ PGO-only regressions are notoriously difficult to get to the root cause of, at least in my books. I know that some of the folks working with vchin have been looking at PGO optimizations lately... perhaps they'd have more insight here?

Transferring ni? to acreskey.
Flags: needinfo?(mconley) → needinfo?(acreskey)
To clarify, the "tabpaint opt e10s stylo" (i.e. non-pgo) benchmark shows roughly the same regression. It's just filtered out in the given link because of lower confidence.

Looking at the graph [1], the benchmarks seems to be bi-modal. So the regression might not be a real one?


[1] https://treeherder.mozilla.org/perf.html#/graphs?highlightedRevisions=fa86f9496325&highlightedRevisions=ca457e1b9ef3&series=%5B%22try%22,%22a59bffa7d0919fe8fc8b4b4aca26f3b8b81de87d%22,1,1%5D&timerange=2592000
Going to find a perf sheriff to give his view.
Flags: needinfo?(acreskey)
But it's been suggested that I look into the replicates, let me take a look.
Flags: needinfo?(acreskey)
(In reply to Michael Woerister from comment #7)
> https://treeherder.mozilla.org/perf.html#/
> graphs?highlightedRevisions=fa86f9496325&highlightedRevisions=ca457e1b9ef3&se
> ries=%5B%22try%22,%22a59bffa7d0919fe8fc8b4b4aca26f3b8b81de87d%22,1,
> 1%5D&timerange=2592000

Note that it's generally not a great idea to look at performance graphs on try, because they depend very much on what kind of things is pushed to try, and *very* regularly, sheriffs push mozilla-central with the beta configuration, which has very different performance characteristics. That alone could account for bimodality of performance graphs. That being said, in this particular case, the same graph, but for mozilla-inbound is just as bimodal.
(In reply to Michael Woerister from comment #4)
> 3. Finally: How does a build with xLTO and no manual inlining compare to the
> current state (no xLTO, code duplication) - This is the most likely
> configuration to replace the current one (i.e. replace manual work with a
> compiler optimizations). It looks like only the "tabpaint-from-parent pgo
> e10s stylo" benchmark is affected negatively, while some others show slight
> improvements.
> https://treeherder.mozilla.org/perf.html#/
> compare?originalProject=try&originalRevision=fa86f94963255a7bff3bc6fef741736b
> 16ab1b92&newProject=try&newRevision=ca457e1b9ef3e473bc9f23de5a2e0dacb7dc8364&
> framework=1&showOnlyConfident=1

Is there a speedometer number for this comparison?
Did we validate whether xLTO removes some code duplication?

Looking at the benchmark that shows the regression it appears that xLTO causes tabpaint-from-parent to exhibit the higher bimodal behaviour more often. Is that common for this benchmark?
> Did we validate whether xLTO removes some code duplication?

I verified that xLTO has the expected effect, i.e. that function calls get inlined across language boundaries.
It looks like the artifacts and logs for this task have expired so I don't have access to the replicates.
Given the noisiness of this particular test, I certainly can't discern a regression there.
Flags: needinfo?(acreskey)
Depends on: 1512723
Depends on: 1533010

Strictly speaking, I guess I landed using the wrong bug, but in any case this work was done as part of bug 1512723.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED

Let's use this bug to enable it on the full set of platforms.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

1.34 hit release, can we enable on Linux and Mac now?

Flags: needinfo?(dmajor)

I believe so. (It was my understanding that glandium would do that so I haven't been paying too close attention to the details.)

Flags: needinfo?(dmajor)

(In reply to David Major [:dmajor] (low availability) from comment #17)

I believe so. (It was my understanding that glandium would do that so I haven't been paying too close attention to the details.)

glandium, is there anything blocking us from enabling on all platforms now or can we update to 1.34 everywhere?

Flags: needinfo?(mh+mozilla)

We switched to 1.34 on all platforms, already. The same as bug 1512723 can be done on other platforms (i.e. adjusting mozconfigs)

A generic long term way to support this is another matter. It's on my todo list, but not currently at the top.

Flags: needinfo?(mh+mozilla)
Depends on: 1543469

Updated to exclude linux64-aarch64 and win32, win64-aarch64 and osx updated to gate off of MOZ_LTO:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9af660f828a281acaac9e150cbad0f855b373c22

Nathan, there's a linker error on linux32. I'm not sure it's worth looking into or if we should just exclude linux32 from xLTO, but you might want to be aware. I'm going on PTO for a bit, can you help finish this off if there need to be other changes?

Flags: needinfo?(nfroyd)

(In reply to Eric Rahm [:erahm] from comment #22)

Updated to exclude linux64-aarch64 and win32, win64-aarch64 and osx updated to gate off of MOZ_LTO:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9af660f828a281acaac9e150cbad0f855b373c22

Nathan, there's a linker error on linux32. I'm not sure it's worth looking into or if we should just exclude linux32 from xLTO, but you might want to be aware. I'm going on PTO for a bit, can you help finish this off if there need to be other changes?

Linux32 and OSX both have geckodriver issues, so we probably need to look into that.

glandium, it looks like win64-aarch64 is upset that we're specifying RUSTFLAGS. You're most familiar with this setup, can you take a look at it? I'm assuming it has something to do with repackaging.

Flags: needinfo?(mh+mozilla)

You need to set RUSTFLAGS before build/mozconfig.no-compile resets it.

Flags: needinfo?(mh+mozilla)

The aarch64 windows thing is trivial (need to check USE_ARTIFACT before setting RUSTFLAGS). The linux32 and OS X cases are a little more interesting.

The Rust backend adds the -plugin-opt arguments that ld is complaining about:

https://github.com/rust-lang/rust/blob/8260e9676089363648dfbce072fca562093b4289/src/librustc_codegen_ssa/back/linker.rs#L211-L213

Reading through the binutils ld code, it looks like the -plugin-opt parsing code expects whatever invoked ld to have already provided -plugin; if there's no plugin registered, that's the only way -plugin-opt parsing can fail. -plugin isn't being provided on linux32 and OS X, so presumably that's why things are failing.

Rust's documentation says that things magically work with lld and that you need to be explicit about some other linker plugin if you're using a different linker (-Clinker-plugin-lto=/path/to/file). That would explain why things work on Windows. But I thought we didn't use lld in automation on our Linux platforms (or OS X, for that matter), so I don't know why the Linux x86-64 builds are working.

And...both the linux32 and OS X builds successfully linked gkrust (with -Clinker-plugin-lto), which should have failed given the above, since we're not using lld in either case.

So why are only host binaries the problem here? I would have expected that:

https://searchfox.org/mozilla-central/source/config/makefiles/rust.mk#133-138

meant that we didn't pass down a set RUSTFLAGS to host binaries, but that does not seem to be the case?

Flags: needinfo?(nfroyd)

(In reply to Nathan Froyd [:froydnj] from comment #25)

The aarch64 windows thing is trivial (need to check USE_ARTIFACT before setting RUSTFLAGS).

Or comment 24.

The linux32 and OS X cases are a little more interesting.

The Rust backend adds the -plugin-opt arguments that ld is complaining about:

https://github.com/rust-lang/rust/blob/8260e9676089363648dfbce072fca562093b4289/src/librustc_codegen_ssa/back/linker.rs#L211-L213

Reading through the binutils ld code, it looks like the -plugin-opt parsing code expects whatever invoked ld to have already provided -plugin; if there's no plugin registered, that's the only way -plugin-opt parsing can fail. -plugin isn't being provided on linux32 and OS X, so presumably that's why things are failing.

Rust's documentation says that things magically work with lld and that you need to be explicit about some other linker plugin if you're using a different linker (-Clinker-plugin-lto=/path/to/file). That would explain why things work on Windows. But I thought we didn't use lld in automation on our Linux platforms (or OS X, for that matter), so I don't know why the Linux x86-64 builds are working.

And...both the linux32 and OS X builds successfully linked gkrust (with -Clinker-plugin-lto), which should have failed given the above, since we're not using lld in either case.

gkrust is also not linked by ld, since it's a static library.

So why are only host binaries the problem here? I would have expected that:

https://searchfox.org/mozilla-central/source/config/makefiles/rust.mk#133-138

meant that we didn't pass down a set RUSTFLAGS to host binaries, but that does not seem to be the case?

The problem for linux32 is that it's a 1-tier PGO build, and it fails during the profile-generate pass, where LTO is disabled as far as C++ is concerned. So -flto is not passed to clang, so it doesn't add -Wl,-plugin. The x86-64 build is 3-tier PGO, so LTO is enabled in the relevant build.

On mac, the problem is that the linker is ld64, and the option to pass flags to the plugin is different. In fact, I'm not even sure there's one to pass arbitrary flags.

I guess we're back to "we shouldn't try to LTO host programs in the first place" land.

That said, it seems there's also an issue to open on the rust side. I'll come up with a small test case.

(In reply to Mike Hommey [:glandium] from comment #26)

I guess we're back to "we shouldn't try to LTO host programs in the first place" land.

or more accurately "host programs and geckodriver", because, here's the funny thing: geckodriver is not a host program (and I /guess/ the only reason for that is to have a mac geckodriver).

I went with glandium's --enable-lto=cross idea, which was not that hard to implement, and got to:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7cf9ae47c82291c8d9fdd4bdedfd1bcc42081861

The Mac failures are because there's some kind of mismatch between what Rust thinks the target is and what Clang thinks the target is? Or what the linker thinks the target is vs. some of the bitfiles? Not totally sure.

The Win32 failure is from a bunch of non-existent symbols in the Stylo Rust/C++ glue code. Somebody is dropping an extern "C" somewhere along the way, perhaps? I'm a little confused as to why that would happen on win32 and not win64, but perhaps people haven't really pushed on LTO on win32...

I am not sure what's going on yet. emilio and dmajor pointed out that win32 needs _ prefixes, whereas win64 does not, so that would explain why win64 works and win32 does not. And emilio pointed out that bindgen uses the sekret \u{1] prefix on external symbols so LLVM doesn't touch the mangling, which also seems involved somehow (since we use #[link_name="\u{1}$NAME"] on Rust names for C++-defined things).

But trying to write a simple Rust/C++ LTO example did not trigger the problem. A win32 build is running right now, and ideally inspecting the generated bindings will point out what I might be doing differently than the actual problem scenario.

Depends on: 1546438

It seems that Rust code is compiled for x86_64-apple-darwin while C code is compiled for x86_64-apple-darwin11. The LLVM ERROR: ThinLTO modules with incompatible triples not supported looks like the LLVM linker plugin then refuses to mix modules for the different targets together. rustc does not seem to have the darwin11 target: https://github.com/rust-lang/rust/tree/master/src/librustc_target/spec.

(In reply to Michael Woerister from comment #32)

It seems that Rust code is compiled for x86_64-apple-darwin while C code is compiled for x86_64-apple-darwin11. The LLVM ERROR: ThinLTO modules with incompatible triples not supported looks like the LLVM linker plugin then refuses to mix modules for the different targets together. rustc does not seem to have the darwin11 target: https://github.com/rust-lang/rust/tree/master/src/librustc_target/spec.

I went through and made everything on the C++ side use (what I thought) was x86_64-apple-darwin, or at least x86_64-darwin. I still got errors.

I patched clang to actually inform me what the mismatched triples were and got a surprising answer: "ThinLTO modules with incompatible triples not supported: builder x86_64-unknown-macosx10.9.0 module triple x86_64-apple-darwin".

A little more investigation says that clang --target=x86_64-darwin tags its bitcode with the x86_64-unknown-macosx10.9.0 triple. Actually, that's not quite right, because clang will apparently pull the version out of the SDK (10.11 in our case) unless it's overriden by MACOSX_DEPLOYMENT_TARGET, which we set to 10.9. Groveling through the clang source, it looks like it has compiler magic to look at the SDK in use and extract the version from that, and put together a "real triple" that gets used. Or from MACOSX_DEPLOYMENT_TARGET, depending.

I am not sure whether we should do the equivalent bits on the Rust side. Handling MACOSX_DEPLOYMENT_TARGET seems kind of reasonable, but groveling through the SDK seems like a bit much. I wonder if it would work to run rustc --target=x86_64-apple-darwin -C llvm-args=--target=x86_64-unknown-macosx10.9.0?

Or we can work around it on the C++ side by doing clang --target=blah ... -Xclang -triple -Xclang x86_64-apple-darwin, which will put the right triple in the bitcode.

I filed https://github.com/rust-lang/rust/issues/60235 for the Rust side of things. The -C llvm-args idea did not work out; I suppose we can try hacking around this with the -Xclang hack for now?

Sigh, the -Xclang hack works on simple examples, but with a full Firefox build, we get lots of errors about TLS not being supported on the target. So that is a dead end.

I did a more structured investigation of the x86 Windows problem and could confirm my initial suspicions:

  • bindgen asks libclang for the mangled name of C entities and then generates Rust bindings that hard-code that mangled name via the #[link_name] attribute.
  • For extern "C" entities on 32bit Windows the mangling happens when LLVM IR is lowered to machine code, so the LLVM IR usually still contains the unmangled name.
  • This leads to the situation where the Rust side of things has calls to the mangled name in LLVM IR but the C side of things has definitions with the unmangled name, e.g. Rust has a call to _Gecko_ReleaseAtom but the C definition has the name Gecko_ReleaseAtom.
  • In a non-cross-language LTO setting this works out because by the time the linker sees the names, the C side has been mangled too.
  • In a cross-language LTO setting, however, the linker operates on the LLVM IR level, where the mismatch is still present.

On x64 Windows this is not a problem because no mangling happens during lowering to machine code. I suspect, however, that we'll run into a similar problem on macOS, where LLVM adds an underscore prefix to symbol names too.

I was able to fix the problem locally by modifying bindgen to not emit #[link_name] attributes if the mangling matched the x86 Windows mangling. That is also my suggested solution to the problem. Hopefully there is a way of implementing this robustly in bindgen.

Emilio, do you have an opinion on the best way to solve comment 37 in bindgen?

Flags: needinfo?(emilio)

I talked with mw this morning. I pointed him to here where we can presumably discern explicit extern "C" vs. non-extern "C" functions in C++.

Here is where we get the mangled cursor with libclang. I'm not sure what Michael has implemented yet, but it should be around there, presumably.

Flags: needinfo?(emilio)

What I did for my proof-of-concept fix was to do the following check:

let skip_link_name_attr = match abi {
    Abi::C => mangled == format!("_{}", canonical_name),
    Abi::FastCall | Abi::StdCall => panic!("not yet implemented"),
    _ => false,
};

here https://github.com/rust-lang/rust-bindgen/blob/3bdfcf7992041acbc7a9278e82d6def8cbd88952/src/codegen/mod.rs#L3394, which is neither very efficient nor very elegant :)

If you did that like:

let skip_link_name_attr = mangled == canonical_name ||
    (abi == Abi::C &&
     !mangled.is_empty() &&
     &mangled[1..] == canonical_name);

if !skip_link_name_attr {
    attributes.push(..);
}

It'd be reasonably ok, I think. That being said... Is there any kind of C++ mangling in any target platform that doesn't have this weird backend mangling that may end up legitimately prepending just an underscore or such?

Cross-language LTO works for Mac with a motley stack of patches and a bleeding-edge rustc:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=06e47b0d019ca2b5268a03e70d0e0db8ac8bb87f

Mac has the same prefixing convention as win32 does, but I believe the underscores actually wind up in the bitcode files in all cases (that is, they're not implicit as they are on win32), so we don't wind up with the same set of issues as we have on win32.

Note: Michael has a PR up to fix the Win32 issue.

So I think what's left is:

  1. Bug 1546438 needs to stick (currently on autoland)
  2. My patch needs to be replaced with Nathan's patch stack (possibly split out as blockers)
  3. mw's PR needs to land
  4. A new version of bindgen needs to be released, we need to update to it
  5. Wait for the rust fix to hit beta (1.36 goes to beta on May 23rd I believe)

The last point isn't clear to me. Can we switch to nightly rust for our nightly Firefox builds?

For reference the rust fix in point 5 is the issue referred to in comment 35.

Attachment #9058475 - Attachment is obsolete: true
Depends on: 1551690

(In reply to Eric Rahm [:erahm] from comment #43)

  1. Wait for the rust fix to hit beta (1.36 goes to beta on May 23rd I believe)

The last point isn't clear to me. Can we switch to nightly rust for our nightly Firefox builds?

This would be an interesting change to make.

The Rust side fix has been nominated for uplift to beta 1.35, which would be usable in infra in a couple of weeks:

https://github.com/rust-lang/rust/pull/60378#issuecomment-492363692

Regarding macOS: I looked into this some more because I was surprised that it worked without the rust-bindgen patch. I verified that the leading underscore for symbol names is not present at the LLVM IR (e.g. the definition is called Gecko_ReleaseAtom, not _Gecko_ReleaseAtom). Rust code, on the other hand, contains calls to @"\01_Gecko_ReleaseAtom", i.e. with the leading underscore - so pretty much the same situation as on x86 Windows.

So why do things work then? My guess is that the linker on macOS is using the ThinLTO plugin a bit differently than LLD, in a way that hides the mismatch. I managed to make the LLVM plugin emit optimization remarks and it is full of messages like:

---- !Missed
-Pass:            inline
-Name:            NoDefinition
-DebugLoc:        { File: 'servo/components/style/gecko_string_cache/mod.rs', 
-                   Line: 454, Column: 16 }
-Function:        _ZN4core3ptr18real_drop_in_place17h49eb718801f38d81E
-Args:            
:  - Callee:          _Gecko_ReleaseAtom
-  - String:          ' will not be inlined into '
-  - Caller:          _ZN4core3ptr18real_drop_in_place17h49eb718801f38d81E
-    DebugLoc:        { File: '/rustc/4443957f272e304e083a8d98583e608d65a712aa/src/libcore/ptr.rs', 
-                       Line: 195, Column: 0 }
-  - String:          ' because its definition is unavailable'

That is, it can't find a definition with the name _Gecko_ReleaseAtom (because it would have to look under Gecko_ReleaseAtom) and thus just ignores the call. Later the Gecko_ReleaseAtom function is lowered to machine code with the mangled name at which point the linker can find it.

The optimization remarks also contain cases where inlining happens because the names match up (since both caller and callee are in C/C++):

---- !Passed
-Pass:            inline
-Name:            Inlined
-DebugLoc:        { File: 'dist/include/nsAtom.h', Line: 168, Column: 9 }
:Function:        Gecko_ReleaseAtom
-Args:            
-  - Callee:          _ZN13nsDynamicAtom11GCAtomTableEv
-    DebugLoc:        { File: 'xpcom/ds/nsAtomTable.cpp', Line: 452, Column: 0 }
-  - String:          ' inlined into '
:  - Caller:          Gecko_ReleaseAtom
-    DebugLoc:        { File: 'layout/style/GeckoBindings.cpp', Line: 961, 
-                       Column: 0 }
-  - String:          ' with '
-  - String:          '(cost='
-  - Cost:            '110'
-  - String:          ', threshold='
-  - Threshold:       '225'
-  - String:          ')'

I'll take another look once we've updated rust-bindgen.

Depends on: 1552329

Latest status:

  • The rust-bindgen fix landed, we've updated to the latest version in-tree
  • The rust team is not going to uplift the target triple rustc fixes. Given this is for mac only that's okay, we can just wait a few days for 1.36 to hit beta and the switch mac over to using that
  • The remaining blocker, bug 1551690, is pending review

Next steps:

  • Michael is going to verify that mac builds are inlined as expected (background in comment 46)
  • Put up the final patches to --enable-lto=cross

I can confirm that a local macOS build with the new rust-bindgen version successfully performs cross-language LTO. LLVM optimization remarks contain many instances of C++ code being inlined into Rust code, e.g.:

--- !Passed
Pass:            inline
Name:            Inlined
DebugLoc:        { File: '/Users/mw/gecko-dev/servo/components/style/gecko_string_cache/mod.rs', 
                   Line: 455, Column: 16 }
Function:        Servo_IsCssPropertyRecordedInUseCounter
Args:            
  - Callee:          Gecko_ReleaseAtom
    DebugLoc:        { File: 'layout/style/GeckoBindings.cpp', Line: 961, 
                       Column: 0 }
  - String:          ' inlined into '
  - Caller:          Servo_IsCssPropertyRecordedInUseCounter
    DebugLoc:        { File: 'servo/ports/geckolib/glue.rs', Line: 6453, 
                       Column: 0 }
  - String:          ' with '
  - String:          '(cost='
  - Cost:            '160'
  - String:          ', threshold='
  - Threshold:       '225'
  - String:          ')'
Depends on: 1554286

This change is a no-op for win64 configs, as they had this feature before.

This change is a little bit of a cheat, because of course MSVC doesn't
do cross-language LTO by default, but it seems consistent.

Depends on D33317

Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d083a8bd98ed
switch all existing `MOZ_LTO` configs to use cross-language LTO; r=dmajor
https://hg.mozilla.org/integration/autoland/rev/5e85998c4d97
default clang-cl pgo to use cross-language LTO; r=dmajor

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&tochange=566166b6edfb6eec0bf9056fc6d638eda52a72fb&fromchange=5e85998c4d97a236e8eaf1aaff622a979f60d789&selectedJob=249494231

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=249494231&repo=autoland&lineNumber=978
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=249494232&repo=autoland&lineNumber=3318
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=249495313&repo=autoland&lineNumber=5087

Backout link: https://hg.mozilla.org/integration/autoland/rev/fb7b087ff3035227a0f9c6b0cf78375f32daaada

23:01:12 INFO - PROCESS-CRASH | browser/components/extensions/test/browser/browser_ext_devtools_inspectedWindow_eval_bindings.js | application crashed [@ servo_arc::Arc<T>::drop_slow + 0x80]
23:01:12 INFO - Crash dump filename: /var/folders/lv/vhwyrgjd40s972dbfck2fnk000000w/T/tmpbxgx3n.mozrunner/minidumps/0281DDD2-556E-4B8D-A3BE-7F9EB864FCA6.dmp
23:01:12 INFO - Operating system: Mac OS X
23:01:12 INFO - 10.10.5 14F27
23:01:12 INFO - CPU: amd64
23:01:12 INFO - family 6 model 69 stepping 1
23:01:12 INFO - 4 CPUs
23:01:12 INFO -
23:01:12 INFO - GPU: UNKNOWN
23:01:12 INFO -
23:01:12 INFO - Crash reason: EXC_BAD_ACCESS / EXC_I386_GPFLT
23:01:12 INFO - Crash address: 0x0
23:01:12 INFO - Process uptime: 130 seconds
23:01:12 INFO -
23:01:12 INFO - Thread 0 (crashed)
23:01:12 INFO - 0 XUL!servo_arc::Arc<T>::drop_slow + 0x80
23:01:12 INFO - rax = 0x0000000000000000 rdx = 0x0000000000000002
23:01:12 INFO - rcx = 0x000000000000001b rbx = 0x00000001346eb5d0
23:01:12 INFO - rsi = 0x0000000000800000 rdi = 0xe5e5e5e5e5e5e5e4
23:01:12 INFO - rbp = 0x00007fff5b6bea70 rsp = 0x00007fff5b6bea50
23:01:12 INFO - r8 = 0x00000001046000b8 r9 = 0x000000000000000a
23:01:12 INFO - r10 = 0x000000010c2b3c60 r11 = 0x0000000000000007
23:01:12 INFO - r12 = 0x000000012714c700 r13 = 0x000000012c4dd748
23:01:12 INFO - r14 = 0x0000000132e0af40 r15 = 0x0000000000000000
23:01:12 INFO - rip = 0x000000010a684290
23:01:12 INFO - Found by: given as instruction pointer in context
23:01:12 INFO - 1 XUL!Servo_StyleRule_GetStyle [glue.rs:5e85998c4d97a236e8eaf1aaff622a979f60d789 : 2073 + 0x21]
23:01:12 INFO - rbp = 0x00007fff5b6beaf0 rsp = 0x00007fff5b6bea80
23:01:12 INFO - rip = 0x000000010a703fd3
23:01:12 INFO - Found by: previous frame's frame pointer
23:01:12 INFO - 2 XUL!mozilla::dom::CSSSupportsRule::CSSSupportsRule(RefPtr<RawServoSupportsRule>, mozilla::StyleSheet*, mozilla::css::Rule*, unsigned int, unsigned int) [CSSSupportsRule.cpp:5e85998c4d97a236e8eaf1aaff622a979f60d789 : 23 + 0x2d]
23:01:12 INFO - rbp = 0x00007fff5b6beb40 rsp = 0x00007fff5b6beb00
23:01:12 INFO - rip = 0x00000001081e9fdd
23:01:12 INFO - Found by: previous frame's frame pointer
23:01:12 INFO - 3 XUL!mozilla::ServoCSSRuleList::GetRule(unsigned int) [ServoCSSRuleList.cpp:5e85998c4d97a236e8eaf1aaff622a979f60d789 : 81 + 0x8]
23:01:12 INFO - rbp = 0x00007fff5b6bebb0 rsp = 0x00007fff5b6beb50
23:01:12 INFO - rip = 0x0000000108224bb1
23:01:12 INFO - Found by: previous frame's frame pointer
23:01:12 INFO - 4 XUL!mozilla::dom::CSSRuleList_Binding::DOMProxyHandler::getOwnPropDescriptor(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::PropertyKey>, bool, JS::MutableHandle<JS::PropertyDescriptor>) const [CSSRuleListBinding.cpp: : 289 + 0x5]
23:01:12 INFO - rbp = 0x00007fff5b6bec20 rsp = 0x00007fff5b6bebc0
23:01:12 INFO - rip = 0x00000001069d79ca
23:01:12 INFO - Found by: previous frame's frame pointer
23:01:12 INFO - 5 XUL!mozilla::dom::XrayResolveOwnProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<JS::PropertyKey>, JS::MutableHandle<JS::PropertyDescriptor>, bool&) [BindingUtils.cpp:5e85998c4d97a236e8eaf1aaff622a979f60d789 : 1680 + 0x6]
23:01:12 INFO - rbp = 0x00007fff5b6becb0 rsp = 0x00007fff5b6bec30
23:01:12 INFO - rip = 0x0000000107120273
23:01:12 INFO - Found by: previous frame's frame pointer
23:01:12 INFO - 6 XUL!xpc::DOMXrayTraits::resolveOwnProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<JS::PropertyKey>, JS::MutableHandle<JS::PropertyDescriptor>) [XrayWrapper.cpp:5e85998c4d97a236e8eaf1aaff622a979f60d789 : 1676 + 0x11]
23:01:12 INFO - rbp = 0x00007fff5b6bed30 rsp = 0x00007fff5b6becc0
23:01:12 INFO - rip = 0x0000000105960d5d
23:01:12 INFO - Found by: previous frame's frame pointer
23:01:12 INFO - 7 XUL!xpc::XrayWrapper<js::CrossCompartmentWrapper, xpc::DOMXrayTraits>::getOwnPropertyDescriptor(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::PropertyKey>, JS::MutableHandle<JS::PropertyDescriptor>) const [XrayWrapper.cpp:5e85998c4d97a236e8eaf1aaff622a979f60d789 : 1884 + 0x1a]
23:01:12 INFO - rbp = 0x00007fff5b6bedc0 rsp = 0x00007fff5b6bed40
23:01:12 INFO - rip = 0x000000010596088b
23:01:12 INFO - Found by: previous frame's frame pointer
23:01:12 INFO - 8 XUL!xpc::XrayWrapper<js::CrossCompartmentWrapper, xpc::DOMXrayTraits>::hasOwn(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::PropertyKey>, bool*) const [XrayWrapper.cpp:5e85998c4d97a236e8eaf1aaff622a979f60d789 : 2115 + 0x5b]
23:01:12 INFO - rbp = 0x00007fff5b6bee20 rsp = 0x00007fff5b6bedd0
23:01:12 INFO - rip = 0x00000001059638f5
23:01:12 INFO - Found by: previous frame's frame pointer
23:01:12 INFO - 9 XUL!js::ProxyGetPropertyByValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::MutableHandle<JS::Value>) [Proxy.cpp:5e85998c4d97a236e8eaf1aaff622a979f60d789 : 369 + 0x21]
23:01:12 INFO - rbp = 0x00007fff5b6beed0 rsp = 0x00007fff5b6bee30
23:01:12 INFO - rip = 0x0000000109c00b66
23:01:12 INFO - Found by: previous frame's frame pointer

Flags: needinfo?(nfroyd)

Ugh. Emilio, does anything stand out to you in that crash stack?

Flags: needinfo?(nfroyd) → needinfo?(emilio)

Not really, that stack looks perfectly normal to me. :(

Though we've seen crash stacks with drop_slow near null in the past, and I don't think we did figure out how those could really happen. The crash in CSSPageRule::~CSSPageRule looks a bit more scary even...

Looks like it also fails some tests and there are some other crashes that may be easier to debug under Servo_DeclarationBlock_Clone. Seems like that's panicking somewhere but I don't see the panic message anywhere:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=249494232&repo=autoland&lineNumber=3318

Flags: needinfo?(emilio)

Emilio, with Nathan out for two weeks and mw out for two months is this something you have time to look into further? This seems to be the final blocker for landing xLTO.

Flags: needinfo?(emilio)

So I tried to take a look at some of the minidumps from Linux, in particular this one.

I followed the docs here and tried to use minidump-2-core, but it crashed. It looked like a simple null-check was missing so I applied this and re-built it:

diff --git a/src/tools/linux/md2core/minidump-2-core.cc b/src/tools/linux/md2core/minidump-2-core.cc
index 941586e9..72680c5f 100644
--- a/src/tools/linux/md2core/minidump-2-core.cc
+++ b/src/tools/linux/md2core/minidump-2-core.cc
@@ -1223,6 +1223,9 @@ main(int argc, const char* argv[]) {
   for (unsigned i = 0; i < header->stream_count; ++i) {
     const MDRawDirectory* dirent =
         dump.GetArrayElement<MDRawDirectory>(header->stream_directory_rva, i);
+    if (!dirent) {
+      continue;
+    }
     switch (dirent->stream_type) {
       case MD_SYSTEM_INFO_STREAM:
         ParseSystemInfo(options, &crashinfo, dump.Subrange(dirent->location),
@@ -1241,6 +1244,9 @@ main(int argc, const char* argv[]) {
   for (unsigned i = 0; i < header->stream_count; ++i) {
     const MDRawDirectory* dirent =
         dump.GetArrayElement<MDRawDirectory>(header->stream_directory_rva, i);
+    if (!dirent) {
+      continue;
+    }
     switch (dirent->stream_type) {
       case MD_THREAD_LIST_STREAM:
         ParseThreadList(options, &crashinfo, dump.Subrange(dirent->location),

But running that errors out with Cannot determine input file format.. Gabriele do you know what the best way to debug those dumps from Linux is?

I'll try to look at the binary and compare it with the ones from the previous push in case I find something interesting.

Flags: needinfo?(gsvelto)

I couldn't debug the minidump, and looking at the symbols of the FFI functions involved, I'd just note that:

  • With fat lto, the compiler unifies some of the symbols to point to the same function (Servo_StyleRule_GetStyle and Servo_SupportsRule_GetRules end up with the same address).
  • With thin/cross lto, that doesn't happen, and the function looks somewhat different (as in, there are some extra instructions that are not on the fat lto build).

But I don't know in which instruction we're actually crashig because of my previous comment, so I'm not quite sure about how to go about debugging this.

Flags: needinfo?(emilio)

Create a mozconfig with:

ac_add_options --enable-project=tools/crashreporter

then mach build.

You'll end up with a $objdir/dist/bin/minidump_stackwalk that you can use by giving it the path to the .dmp and the path to the unzipped target.crashreporter-symbols.zip.

This gives this:

Crash reason:  EXC_BAD_ACCESS / EXC_I386_GPFLT
Crash address: 0x0
Process uptime: 128 seconds

Thread 0 (crashed)
 0  XUL!servo_arc::Arc<T>::drop_slow + 0x80
    rax = 0x0000000000000000   rdx = 0x0000000000000004
    rcx = 0x0000000000000001   rbx = 0x000000011f889890
    rsi = 0x0000000129a00e28   rdi = 0xe5e5e5e5e5e5e5e4
    rbp = 0x00007fff5d78fb40   rsp = 0x00007fff5d78fb20
     r8 = 0x000000010d7000b8    r9 = 0x000000000000000a
    r10 = 0x00007fff5d78fa98   r11 = 0x0000000000000000
    r12 = 0x0000000122c18ca0   r13 = 0x0000000122cc0158
    r14 = 0x0000000122ca0460   r15 = 0x0000000000000000
    rip = 0x00000001082f2290
    Found by: given as instruction pointer in context
 1  XUL!Servo_StyleRule_GetStyle [glue.rs:5e85998c4d97a236e8eaf1aaff622a979f60d789 : 2073 + 0x21]
    rbp = 0x00007fff5d78fbc0   rsp = 0x00007fff5d78fb50
    rip = 0x0000000108371fd3
    Found by: previous frame's frame pointer

Note that while doing that, I was first welcome with

minidump.cc:5213: ERROR: Minidump header signature mismatch: (0x8088b1f, 0x1f8b0808) != 0x504d444d

which sounds like what you might have hit, and that's because taskcluster likes to "transparently" gzip things. So the downloaded .dmp is actually a .gz file you need to gunzip first.

Wait, hold on, looking at that closely again, that stack actually makes no actual sense! That is this function here:

And that's not supposed to drop any Arc at all, wth?

That into_strong call is supposed to prevent the Arc getting dropped by transmuting it into a Strong<T>, which has no Drop implementation. So that stack smells a bit like miscompilation (hopefully) or UB going bad (hopefully not).

We haven't touched minidump-2-core for quite a while so I'm not sure if it's reliable. On the other hand minidump_stackwalk should print out some decent stacks following the instructions Mike wrote in comment 58.

Flags: needinfo?(gsvelto)

Ok, I'll try again a bit harder.

Flags: needinfo?(emilio)

If you disassemble XUL!Servo_StyleRule_GetStyle, is there really a call to drop_slow at offset 0x21? (Or maybe a few bytes before, if 0x21 is the address of the instruction after)

Yes, there is a call to drop_slow:

(lldb) disas -a 0x0000000005b70fd3
XUL`Servo_SupportsRule_GetRules:
XUL[0x5b70f20] <+0>:   pushq  %rbp
XUL[0x5b70f21] <+1>:   movq   %rsp, %rbp
XUL[0x5b70f24] <+4>:   pushq  %r15
XUL[0x5b70f26] <+6>:   pushq  %r14
XUL[0x5b70f28] <+8>:   pushq  %rbx
XUL[0x5b70f29] <+9>:   subq   $0x58, %rsp
XUL[0x5b70f2d] <+13>:  movq   %rsi, %r14
XUL[0x5b70f30] <+16>:  movq   %rdi, %rbx
XUL[0x5b70f33] <+19>:  leaq   0x1b6eb9e(%rip), %rax     ; _$LT$style..global_style_data..GLOBAL_STYLE_DATA$u20$as$u20$core..ops..deref..Deref$GT$::deref::__stability::LAZY::hd2043aa93a146e3c (.llvm.15187239529951727520)
XUL[0x5b70f3a] <+26>:  movq   %rax, -0x28(%rbp)
XUL[0x5b70f3e] <+30>:  movq   0x1b6ebab(%rip), %rax     ; _$LT$style..global_style_data..GLOBAL_STYLE_DATA$u20$as$u20$core..ops..deref..Deref$GT$::deref::__stability::LAZY::hd2043aa93a146e3c (.llvm.15187239529951727520) + 24
XUL[0x5b70f45] <+37>:  cmpq   $0x3, %rax
XUL[0x5b70f49] <+41>:  jne    0x5b70fe9                 ; <+201>
XUL[0x5b70f4f] <+47>:  movq   -0x28(%rbp), %rax
XUL[0x5b70f53] <+51>:  cmpb   $0x2, 0x10(%rax)
XUL[0x5b70f57] <+55>:  je     0x5b7101e                 ; <+254>
XUL[0x5b70f5d] <+61>:  movq   (%rax), %r15
XUL[0x5b70f60] <+64>:  testq  %r15, %r15
XUL[0x5b70f63] <+67>:  je     0x5b7103b                 ; <+283>
XUL[0x5b70f69] <+73>:  movabsq $-0x8000000000000000, %rcx ; imm = 0x8000000000000000 
XUL[0x5b70f73] <+83>:  xorl   %eax, %eax
XUL[0x5b70f75] <+85>:  lock   
XUL[0x5b70f76] <+86>:  cmpxchgq %rcx, 0x8(%r15)
XUL[0x5b70f7b] <+91>:  jne    0x5b71047                 ; <+295>
XUL[0x5b70f81] <+97>:  movq   (%rbx), %rax
XUL[0x5b70f84] <+100>: testq  %rax, %rax
XUL[0x5b70f87] <+103>: je     0x5b71023                 ; <+259>
XUL[0x5b70f8d] <+109>: cmpq   %rax, %r15
XUL[0x5b70f90] <+112>: jne    0x5b71023                 ; <+259>
XUL[0x5b70f96] <+118>: movq   -0x8(%r14), %rax
XUL[0x5b70f9a] <+122>: addq   $-0x8, %r14
XUL[0x5b70f9e] <+126>: cmpq   $-0x1, %rax
XUL[0x5b70fa2] <+130>: je     0x5b70fae                 ; <+142>
XUL[0x5b70fa4] <+132>: lock   
XUL[0x5b70fa5] <+133>: incq   (%r14)
XUL[0x5b70fa8] <+136>: jle    0x5b710bc                 ; <+412>
XUL[0x5b70fae] <+142>: movq   0x28(%rbx), %rax
XUL[0x5b70fb2] <+146>: addq   $0x28, %rbx
XUL[0x5b70fb6] <+150>: movq   (%rax), %rcx
XUL[0x5b70fb9] <+153>: cmpq   $-0x1, %rcx
XUL[0x5b70fbd] <+157>: je     0x5b70fd3                 ; <+179>
XUL[0x5b70fbf] <+159>: lock   
XUL[0x5b70fc0] <+160>: decq   (%rax)
XUL[0x5b70fc3] <+163>: jne    0x5b70fd3                 ; <+179>
XUL[0x5b70fc5] <+165>: movq   (%rbx), %rax
XUL[0x5b70fc8] <+168>: movq   (%rax), %rax
XUL[0x5b70fcb] <+171>: movq   %rbx, %rdi
XUL[0x5b70fce] <+174>: callq  0x5af1210                 ; servo_arc::Arc$LT$T$GT$::drop_slow::h5bfb4b1675d356ee
XUL[0x5b70fd3] <+179>: movq   %r14, (%rbx)
XUL[0x5b70fd6] <+182>: movq   $0x0, 0x8(%r15)
XUL[0x5b70fde] <+190>: addq   $0x58, %rsp
XUL[0x5b70fe2] <+194>: popq   %rbx
XUL[0x5b70fe3] <+195>: popq   %r14
XUL[0x5b70fe5] <+197>: popq   %r15
XUL[0x5b70fe7] <+199>: popq   %rbp
XUL[0x5b70fe8] <+200>: retq   
XUL[0x5b70fe9] <+201>: leaq   -0x28(%rbp), %rax
XUL[0x5b70fed] <+205>: movq   %rax, -0x38(%rbp)
XUL[0x5b70ff1] <+209>: leaq   -0x38(%rbp), %rax
XUL[0x5b70ff5] <+213>: movq   %rax, -0x68(%rbp)
XUL[0x5b70ff9] <+217>: leaq   0x1b6eaf0(%rip), %rdi     ; _$LT$style..global_style_data..GLOBAL_STYLE_DATA$u20$as$u20$core..ops..deref..Deref$GT$::deref::__stability::LAZY::hd2043aa93a146e3c (.llvm.15187239529951727520) + 24
XUL[0x5b71000] <+224>: leaq   0x1b1bd71(%rip), %rdx     ; anon.20480550abd40f7c5a6ab6f2e04b754b.85.llvm.15187239529951727520
XUL[0x5b71007] <+231>: leaq   -0x68(%rbp), %rsi
XUL[0x5b7100b] <+235>: callq  0x5f24d50                 ; std::sync::once::Once::call_inner::h3f5b185bfe61a1f0 (.llvm.15187239529951727520)
XUL[0x5b71010] <+240>: movq   -0x28(%rbp), %rax
XUL[0x5b71014] <+244>: cmpb   $0x2, 0x10(%rax)
XUL[0x5b71018] <+248>: jne    0x5b70f5d                 ; <+61>
XUL[0x5b7101e] <+254>: callq  0x5edf530                 ; lazy_static::lazy::unreachable_unchecked::h77f590b606dfa998 (.llvm.15187239529951727520)
XUL[0x5b71023] <+259>: leaq   0x80d97e(%rip), %rdi      ; str.C.4618 + 104
XUL[0x5b7102a] <+266>: leaq   0x1b1b447(%rip), %rdx     ; anon.20480550abd40f7c5a6ab6f2e04b754b.29.llvm.15187239529951727520 + 19440
XUL[0x5b71031] <+273>: movl   $0x51, %esi
XUL[0x5b71036] <+278>: callq  0x5ad5920                 ; std::panicking::begin_panic::h753c026e2313fa3b (.llvm.15187239529951727520)
XUL[0x5b7103b] <+283>: leaq   0x1b268d6(%rip), %rdi     ; anon.0f09dff20376d92cc45253bc08e814bf.166.llvm.15187239529951727520
XUL[0x5b71042] <+290>: callq  0x5f2bf80                 ; core::panicking::panic::haf72db49750c28ab (.llvm.15187239529951727520)
XUL[0x5b71047] <+295>: xorl   %ecx, %ecx
XUL[0x5b71049] <+297>: testq  %rax, %rax
XUL[0x5b7104c] <+300>: setns  %cl
XUL[0x5b7104f] <+303>: leaq   0x80c631(%rip), %rax      ; anon.20480550abd40f7c5a6ab6f2e04b754b.26.llvm.15187239529951727520
XUL[0x5b71056] <+310>: leaq   0x80c633(%rip), %rdx      ; anon.20480550abd40f7c5a6ab6f2e04b754b.27.llvm.15187239529951727520
XUL[0x5b7105d] <+317>: cmovnsq %rax, %rdx
XUL[0x5b71061] <+321>: leaq   0x7(%rcx,%rcx), %rax
XUL[0x5b71066] <+326>: movq   %rdx, -0x38(%rbp)
XUL[0x5b7106a] <+330>: movq   %rax, -0x30(%rbp)
XUL[0x5b7106e] <+334>: leaq   -0x38(%rbp), %rax
XUL[0x5b71072] <+338>: movq   %rax, -0x28(%rbp)
XUL[0x5b71076] <+342>: leaq   -0x3b970d(%rip), %rax     ; _$LT$$RF$T$u20$as$u20$core..fmt..Display$GT$::fmt::h0031dc61ce6c65fd (.llvm.15187239529951727520)
XUL[0x5b7107d] <+349>: movq   %rax, -0x20(%rbp)
XUL[0x5b71081] <+353>: leaq   0x1b167e0(%rip), %rax     ; anon.20480550abd40f7c5a6ab6f2e04b754b.25.llvm.15187239529951727520
XUL[0x5b71088] <+360>: movq   %rax, -0x68(%rbp)
XUL[0x5b7108c] <+364>: movq   $0x2, -0x60(%rbp)
XUL[0x5b71094] <+372>: movq   $0x0, -0x58(%rbp)
XUL[0x5b7109c] <+380>: leaq   -0x28(%rbp), %rax
XUL[0x5b710a0] <+384>: movq   %rax, -0x48(%rbp)
XUL[0x5b710a4] <+388>: movq   $0x1, -0x40(%rbp)
XUL[0x5b710ac] <+396>: leaq   0x1b167d5(%rip), %rsi     ; anon.20480550abd40f7c5a6ab6f2e04b754b.29.llvm.15187239529951727520
XUL[0x5b710b3] <+403>: leaq   -0x68(%rbp), %rdi
XUL[0x5b710b7] <+407>: callq  0x5f168c0                 ; std::panicking::begin_panic_fmt::h50d17c3950ddfe2a (.llvm.15187239529951727520)
XUL[0x5b710bc] <+412>: callq  0x5f251d0                 ; std::process::abort::h1bde3ba12adc17b8 (.llvm.15187239529951727520)
XUL[0x5b710c1] <+417>: nopw   %cs:(%rax,%rax)
XUL[0x5b710cb] <+427>: nopl   (%rax,%rax)

Note that the right function in that stack is Servo_SupportsRule_GetRules, but that function is pretty similar to Servo_StyleRule_GetStyle, so I suspect the linker has unified them. However, those functions should not drop any arc.

Looking around, it seems that the linker has also unified Servo_StyleRule_SetStyle (which does drop the Arc<>) too, since that symbol is at the same address. So I suspect that the LTO stuff has messed up, and has unified Servo_SupportsRule_GetRules, Servo_StyleRule_GetStyle, and Servo_StyleRule_SetStyle. I think that explains those crashes. But I'm not sure I can debug further to figure out why thinlto has unified those. Any idea?

Yeah, disas -n {any of those three symbols} ends up at the same address.

Flags: needinfo?(emilio)

David, I'm on Linux, any idea about how I could debug this? The way I'd go as of right now is trying to dump the LLVM IR for this with RUSTFLAGS="-C save-temps" and figure out if there's something weird.

Then I guess trying to use the same rust / llvm compiler as automation and cross-compile a mac build and try to debug the LTO stuff...

Flags: needinfo?(dmajor)

Normally when functions get unified, that's the work of the linker's ICF, not LTO per se. It might be good to confirm this. If you're pushing to tryserver, try adding ac_add_options --disable-icf. If you're building locally, ICF ought to be disabled already, if I'm reading compiler-opts.m4 correctly. Once we know that ICF is for sure disabled, disassemble the getter and setter again and see if it's any better.

Other than that, I wonder if there's any optimization remarks that we could enable (ala Comment 46) to see what LLVM is doing with Servo_SupportsRule_GetRules.

Flags: needinfo?(dmajor)

ni?ing myself to check when done.

Flags: needinfo?(emilio)

So, the xul section sizes came out identical between the builds, meaning the icf flag had no effect. I don't see any icf= in either of the build logs.

...aand I just remembered this is mac, so we're not using lld, so I have no familiarity at all here. Does ld64 default to icf?

There is a no_deduplicate / verbose_deduplicate in the ld64 man page, which looks like ld64-speak for icf, and it looks like clang explicitly disables it under certain conditions, which might explain why the crashes are only in shippable/opt builds? (Disclaimer: Not a mac user, just driving by since this looks interesting and I'm anxiously waiting for xLTO)

edit: fixed link

Thanks for the pointer! Your markdown for "disables it" points back to this bug, was that supposed to be a different link?

There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:froydnj, could you have a look please?
For more information, please visit auto_nag documentation.

Flags: needinfo?(nfroyd)
Flags: needinfo?(nfroyd)

Explicitly disabling ICF with ld64-specific options in the LTO setup does not seem to work:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=307546f26e77541f5855dff2046d2991e0ca1e86

which is, um, puzzling. I suppose the next step would be to see whether ICF was actually disabled by disassembling the appropriate functions in the binary.

The breakpad symbol file does say:

FUNC 5bd4e70 f6 0 Servo_StyleRule_GetStyle
FUNC 5bd4e70 1a1 0 Servo_StyleRule_SetStyle

which is, um, peculiar, and Servo_SupportsRule_GetRules doesn't seem to be listed in the symbol file.

The breakpad symbol file from the push linked in comment 67 says:

FUNC 5b70f20 f6 0 Servo_StyleRule_GetStyle
5b70f20 1e 2077 12325
...
FUNC 5b70f20 1a1 0 Servo_StyleRule_SetStyle
5b71016 8 35 12800
...

without any references to Servo_SupportsRule_GetRules either. Notice that those two functions are reputed to have the same start address, but very different sequences for their actual instructions + line numbers. I guess ICF there just completely botched the debug information?

But also, assuming that the information in the breakpad symbol file for each function is correct, how did those two functions get folded, since they have completely different lengths?

Fwiw, checking the most recent git history shows that in ld ~< 409.12 a custom hashing function atom_hashing::hash (and a sameFixups method I don't really understand) was used in dedup to compare Atoms, whereas the newest version also checks size and does an actual memcmp. Now I don't know why this was added, but it just looked like maybe a tiny chance of a hash collision exists. Again, just poking in the dark here.

It may be more likely that there's another bug like this one. I'm not even sure if that one's fixed in the version used in automation, though the circumstances don't seem to match to what happens here.

(In reply to Johannes Pfrang [:johnp] from comment #75)

Fwiw, checking the most recent git history shows that in ld ~< 409.12 a custom hashing function atom_hashing::hash (and a sameFixups method I don't really understand) was used in dedup to compare Atoms, whereas the newest version also checks size and does an actual memcmp. Now I don't know why this was added, but it just looked like maybe a tiny chance of a hash collision exists. Again, just poking in the dark here.

It may be more likely that there's another bug like this one. I'm not even sure if that one's fixed in the version used in automation, though the circumstances don't seem to match to what happens here.

That's a good point! Our cctools-port dates back to https://github.com/tpoechtrager/cctools-port/commit/8395d4b2c3350356e2fb02f5e04f4f463c7388df , so it's probably worth updating it to see if that fixes weird LTO bugs. We do have cctools-port#58 applied, but maybe the additional checks on atoms would fix some of the issues.

Well, that's disappointing: comment 77 obviously blew up, and after spending half a day getting the new version of cctools to build on our infra, we're still seeing peculiar stylo crashes:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5626f8b1cb4c40aa90fa3aa174179da945b371cc

(Note that the cctools job successfully executed on a previous push, so the results from that previous job were used--assuming I understand taskcluster correctly.)

mochitest-4 left behind crash dumps, the first of which, 4D67623C-mumble, says the crashing stack is:

 0  XUL!Servo_CssRules_ListTypes [glue.rs:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 1883 + 0x49]
    rax = 0x000000011264ac18   rdx = 0x0000000000000000
    rcx = 0xf4005a6066f99b11   rbx = 0x00000001298ac790
    rsi = 0x0000000000000001   rdi = 0x0000000000000000
    rbp = 0x00007fff550b4800   rsp = 0x00007fff550b47b0
     r8 = 0x0000000115f001b8    r9 = 0x000000000000000a
    r10 = 0x0000000000000000   r11 = 0x0000000000000000
    r12 = 0x000000011efb0500   r13 = 0x0000000000000000
    r14 = 0x000000011efb04f8   r15 = 0x00000001298ac7d0
    rip = 0x0000000110ab1eb9
    Found by: given as instruction pointer in context
 1  XUL!mozilla::dom::CSSSupportsRule::CSSSupportsRule(RefPtr<RawServoSupportsRule>, mozilla::StyleSheet*, mozilla::css::Rule*, unsigned int, unsigned int) [CSSSupportsRule.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 23 + 0xe4]
    rbp = 0x00007fff550b4850   rsp = 0x00007fff550b4810
    rip = 0x000000010e5896e4
    Found by: previous frame's frame pointer
 2  XUL!mozilla::ServoCSSRuleList::GetRule(unsigned int) [ServoCSSRuleList.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 81 + 0x8]
    rbp = 0x00007fff550b48c0   rsp = 0x00007fff550b4860
    rip = 0x000000010e5c4261
    Found by: previous frame's frame pointer
 3  XUL!mozilla::ServoStyleRuleMap::EnsureTable(mozilla::ServoStyleSet&) [ServoStyleRuleMap.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 28 + 0xb4]
    rbp = 0x00007fff550b4910   rsp = 0x00007fff550b48d0
    rip = 0x000000010e96b06a
    Found by: previous frame's frame pointer
 4  XUL!mozilla::ServoStyleSet::StyleRuleMap() [ServoStyleSet.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 1176 + 0xf]
    rbp = 0x00007fff550b4960   rsp = 0x00007fff550b4920
    rip = 0x000000010e5c8e6c
    Found by: previous frame's frame pointer
 5  XUL!mozilla::dom::InspectorUtils::GetCSSStyleRules(mozilla::dom::GlobalObject&, mozilla::dom::Element&, nsTSubstring<char16_t> const&, nsTArray<RefPtr<mozilla::BindingStyleRule> >&) [InspectorUtils.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 183 + 0x5]
    rbp = 0x00007fff550b49e0   rsp = 0x00007fff550b4970
    rip = 0x000000010e968670
    Found by: previous frame's frame pointer
 6  XUL!mozilla::dom::InspectorUtils_Binding::getCSSStyleRules(JSContext*, unsigned int, JS::Value*) [InspectorUtilsBinding.cpp: : 2863 + 0x8]
    rbp = 0x00007fff550b4b50   rsp = 0x00007fff550b49f0
    rip = 0x000000010d48a772
    Found by: previous frame's frame pointer
 7  0x2a906ae7495f
    rbp = 0x00007fff550b4bb0   rsp = 0x00007fff550b4b60
    rip = 0x00002a906ae7495f
    Found by: previous frame's frame pointer

The second, 3A69FB35-mumble, says:

 0  XUL!servo_arc::Arc<T>::drop_slow + 0x80
    rax = 0x000000011b4b04f0   rdx = 0x00000001679e3f60
    rcx = 0x00000000000007fd   rbx = 0x00000001663fb2a8
    rsi = 0x00007fff58bde2b8   rdi = 0xe5e5e5e5e5e5e5e4
    rbp = 0x00007fff58bde1d0   rsp = 0x00007fff58bde1b0
     r8 = 0x0000000000000000    r9 = 0x000000000000000f
    r10 = 0x0000000000c60000   r11 = 0xffff80020c90d418
    r12 = 0x00000001679e3f60   r13 = 0x00000001663fb288
    r14 = 0x00000001663e9e50   r15 = 0x0000000000000000
    rip = 0x000000010cefdaf0
    Found by: given as instruction pointer in context
 1  XUL!Servo_Keyframe_GetStyle + 0xb3
    rbp = 0x00007fff58bde250   rsp = 0x00007fff58bde1e0
    rip = 0x000000010cf9bdc3
    Found by: previous frame's frame pointer
 2  XUL!mozilla::dom::CSSMozDocumentRule::CSSMozDocumentRule(RefPtr<RawServoMozDocumentRule>, mozilla::StyleSheet*, mozilla::css::Rule*, unsigned int, unsigned int) [CSSMozDocumentRule.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 80 + 0x2d]
    rbp = 0x00007fff58bde2a0   rsp = 0x00007fff58bde260
    rip = 0x000000010aa5099d
    Found by: previous frame's frame pointer
 3  XUL!mozilla::ServoCSSRuleList::GetRule(unsigned int) [ServoCSSRuleList.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 82 + 0x8]
    rbp = 0x00007fff58bde310   rsp = 0x00007fff58bde2b0
    rip = 0x000000010aa8d2d7
    Found by: previous frame's frame pointer
 4  XUL!nsTreeSanitizer::SanitizeChildren(nsINode*) [CSSRuleList.h:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 28 + 0x10]
    rbp = 0x00007fff58bde5f0   rsp = 0x00007fff58bde320
    rip = 0x0000000108c5a72a
    Found by: previous frame's frame pointer
 5  XUL!mozilla::HTMLEditor::ParseFragment(nsTSubstring<char16_t> const&, nsAtom*, mozilla::dom::Document*, mozilla::dom::DocumentFragment**, bool) [HTMLEditorDataTransfer.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 2448 + 0x10]
    rbp = 0x00007fff58bde640   rsp = 0x00007fff58bde600
    rip = 0x000000010a9eb404
    Found by: previous frame's frame pointer
 6  XUL!mozilla::HTMLEditor::DoInsertHTMLWithContext(nsTSubstring<char16_t> const&, nsTSubstring<char16_t> const&, nsTSubstring<char16_t> const&, nsTSubstring<char16_t> const&, mozilla::dom::Document*, mozilla::EditorDOMPointBase<nsCOMPtr<nsINode>, nsCOMPtr<nsIContent> > const&, bool, bool, bool) [HTMLEditorDataTransfer.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 223 + 0x3e]
    rbp = 0x00007fff58bdea60   rsp = 0x00007fff58bde650
    rip = 0x000000010a9de4ff
    Found by: previous frame's frame pointer
 7  XUL!mozilla::HTMLEditor::InsertFromTransferable(nsITransferable*, mozilla::dom::Document*, nsTSubstring<char16_t> const&, nsTSubstring<char16_t> const&, bool, bool) [HTMLEditorDataTransfer.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 1292 + 0x1c]
    rbp = 0x00007fff58bded10   rsp = 0x00007fff58bdea70
    rip = 0x000000010a9e7a0f
    Found by: previous frame's frame pointer
...

Servo_Keyframe_GetStyle appears to have been ICF'd to Servo_Keyframe_SetStyle; again, seemingly different function sizes, so ICF should have left those alone.

emilio, does the first stack make any more sense?

It seems like the first style should be Servo_SupportsRule_GetRules. It doesn't make any sense for that function to be unified with Servo_CSSRules_ListTypes, which writes to an array and what not: https://searchfox.org/mozilla-central/rev/7e158713cf5a8514fa8161dd4a239737b05da64d/servo/ports/geckolib/glue.rs#1879

Are we sure that the suspect address is really doing the work of a Servo_CSSRules_ListTypes, and it's not just the debug info lying to us? (A long shot, but it would be good to explicitly rule out.)

-verbose_deduplicate actually says there are 0 duplicated functions across all of Firefox. Which surprises me, but I think makes a certain amount of sense under LTO: LLVM actually contains a function-merging pass for doing ICF-esque things at compile-time, rather than link-time...which with LTO means that we're doing ICF not with the linker, but with the compiler that gets invoked by the linker. Which means that we're possibly dealing with a bug in the compiler, ugh.

Unfortunately the history for the MergeFunctions pass doesn't seem to contain anything obviously illuminating in terms of bugfixes that we might be missing:

https://github.com/llvm-mirror/llvm/commits/master/lib/Transforms/IPO/MergeFunctions.cpp

Nor does the logging story for the pass inspire confidence: getting basic information out of the pass like "these two functions are getting merged" requires a debug build (!). So trying to figure out what's going on requires a debug LLVM, or at least some patches to encourage other behavior.

Oh, and for added fun, the problems could be in Rust's LLVM, too (this seems unlikely to me, but worth mentioning for completeness).

This could be wrong, of course, but it seems at least plausible.

Rustc auto-enables merge-funcs:

    /// Determines how or whether the MergeFunctions LLVM pass should run for
    /// this target. Either "disabled", "trampolines", or "aliases".
    /// The MergeFunctions pass is generally useful, but some targets may need
    /// to opt out. The default is "aliases".
    ///
    /// Workaround for: https://github.com/rust-lang/rust/issues/57356
    pub merge_functions: MergeFunctions,

I gave a shot at disabling that pass here, we'll see: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b9b59a888533dcdeff09591b11ae68f5412cbd46

(Edited since at first I was going to ask how to disable it but then I figured out that RUSTC_BOOTSTRAP would do the trick)

Flags: needinfo?(emilio)

Hmm, that didn't work... Nathan, any idea why? Should I try to make osx builds in tc use rust nightly? Or am I way off? :)

Flags: needinfo?(nfroyd)

(In reply to Emilio Cobos Álvarez (:emilio) from comment #84)

Hmm, that didn't work... Nathan, any idea why? Should I try to make osx builds in tc use rust nightly? Or am I way off? :)

I suppose you'd have to use nightly. You can modify the job itself:

https://searchfox.org/mozilla-central/source/taskcluster/ci/build/macosx.yml#294

to use the linux64-rust-nightly-macos toolchain, or you can modify the linux64-rust-macos toolchain:

https://searchfox.org/mozilla-central/source/taskcluster/ci/toolchain/rust.yml#123

(notice the toolchain-alias field) to repact nightly from some date rather than beta from some date.

I am trying to build things locally to investigate as well.

Flags: needinfo?(nfroyd)

Actually, I suppose you'd have to bump the linux64-rust-nightly-macos to use a newer date, so that it picks up:

https://github.com/rust-lang/rust/pull/60788

OK, so a local build says that libgkrust.a has distinct (LLVM bitcode) code for Servo_Keyframe_{Get,Set}Style and Servo_StyleRule_{Get,Set}Style are distinct functions. I haven't diff'd them, but eyeballing them says they should not be merged. It does look like they could almost share prefixes of code, depending on how optimization shakes out, but that seems hard to do.

...and looking at XUL in a local build, those functions from comment 87 have distinct addresses. So how are the crashes happening and why is the breakpad symbol file such garbage?

Garbage DWARF?

OK, something is clearly going wrong. In the llvm bitcode produced by rustc, we have:

froydnj@hawkeye:/opt/build/froydnj/build-macosx-cross/objs$ egrep 'Keyframe_...Style' gkrust.dis 
@Servo_MozDocumentRule_GetRules = unnamed_addr alias i64 (%"nsstring::nsACString"*), i64 (%"nsstring::nsACString"*)* @Servo_Keyframe_GetStyle
define i64 @Servo_Keyframe_GetStyle(%"nsstring::nsACString"* noalias nocapture nonnull readonly align 1) unnamed_addr #2 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* bitcast (void ()* @GkRust_Shutdown to i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)*) !dbg !4131763 {
define void @Servo_Keyframe_SetStyle(%"nsstring::nsACString"* noalias nocapture nonnull readonly align 1, %"nsstring::nsACString"* noalias nonnull readonly align 1) unnamed_addr #2 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* bitcast (void ()* @GkRust_Shutdown to i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)*) !dbg !4131973 {

Which seems kind of reasonable (I haven't looked at the code to make sure Servo_MozDocument_GetRules and Servo_Keyframe_GetStyle actually access things at identical offsets). But then in the actual library, we have:

froydnj@hawkeye:/opt/build/froydnj/build-macosx-cross/objs$ ~/.mozbuild/clang/bin/llvm-nm ../toolkit/library/XUL |egrep 'Servo_'
00000000059751f0 T _Servo_Keyframe_SetStyle
00000000059750f0 T _Servo_Keyframe_GetStyle
00000000059751f0 T _Servo_MozDocumentRule_GetRules

Notice that at the bitcode IR level, we have &Servo_Keyframe_GetStyle == &Servo_MozDocumentRule_GetRules, but by the time we make it to the final object, we have &Servo_Keyframe_SetStyle == &Servo_MozDocumentRule_GetRules, which is, um, wrong. We can look at how many Servo_* functions wind up deduplicated:

froydnj@hawkeye:/opt/build/froydnj/build-macosx-cross/objs$ ~/.mozbuild/clang/bin/llvm-nm ../toolkit/library/XUL |egrep 'Servo_' |sort -g -k 1 | uniq -D --check-chars=16
0000000005a03b70 T _Servo_CloneBasicShape
0000000005a03b70 T _Servo_UseCounters_Drop
0000000005cd1ec0 T _Servo_CssRules_Release
0000000005cd1ec0 T _Servo_MediaList_AddRef
0000000005cd2150 T _Servo_AnimationValue_Release
0000000005cd2150 T _Servo_CounterStyleRule_AddRef
0000000005cd2150 T _Servo_CssUrlData_AddRef
0000000005cd2150 T _Servo_DeclarationBlock_AddRef
0000000005cd2150 T _Servo_DocumentRule_AddRef
0000000005cd2150 T _Servo_FontFaceRule_AddRef
0000000005cd2150 T _Servo_FontFeatureValuesRule_AddRef
0000000005cd2150 T _Servo_ImportRule_AddRef
0000000005cd2150 T _Servo_Keyframe_AddRef
0000000005cd2150 T _Servo_KeyframesRule_AddRef
0000000005cd2150 T _Servo_MediaRule_AddRef
0000000005cd2150 T _Servo_NamespaceRule_AddRef
0000000005cd2150 T _Servo_PageRule_AddRef
0000000005cd2150 T _Servo_StyleRule_AddRef
0000000005cd2150 T _Servo_StyleSheetContents_AddRef
0000000005cd2150 T _Servo_SupportsRule_AddRef
00000000059751f0 T _Servo_Keyframe_SetStyle
00000000059751f0 T _Servo_MozDocumentRule_GetRules
0000000005956780 T _Servo_StyleRule_SetStyle
0000000005956780 T _Servo_SupportsRule_GetRules
0000000005976e10 T _Servo_NamespaceRule_GetPrefix
0000000005976e10 T _Servo_PageRule_GetStyle

Presumably Servo_StyleRule_SetStyle and Servo_SupportsRule_GetRules shouldn't have been folded together. Servo_CssRules_Release should also be a distinct function from Servo_MediaList_AddRef, I'd think? Same with Servo_AnimationValue_Release and Servo_CounterStyleRule_AddRef. Not sure about the Servo_CloneBasicShape and Servo_UseCounters_Drop equivalency.

I really hope this is an artifact of enabling cross-language LTO, and not an issue we're seeing today...

I wonder if I can convince ld64 to dump all the bitcode post-LLVM optimization, but before translation to object code and therefore, ideally, before ld64 optimizations run on the object code.

ld64 dumps all of its intermediate LLVM bitcode and post-LLVM-optimization Mach-O files (!) with -save-temps. (If you are playing along at home, have ~40GB free to do this, on top of the space for your normal objdir.) ld64 can also be convinced to pass in options to its invoked LLVM via -mllvm $OPTION (-Wl,-mllvm -Wl,$OPTION from the compiler), but I have not been able to get what I think is some debugging code run by doing so, so I must be doing something wrong.

For the merged functions in comment 90, you can see in the intermediate Mach-O file that LLVM produces (these are held in memory if you're not dumping with -save-temps) that everything is OK:

00000000005822b0 T _Servo_StyleRule_GetStyle
00000000005822b0 T _Servo_SupportsRule_GetRules
00000000005823b0 T _Servo_StyleRule_SetStyle
...
00000000005a0d20 T _Servo_Keyframe_GetStyle
00000000005a0d20 T _Servo_MozDocumentRule_GetRules
00000000005a0e20 T _Servo_Keyframe_SetStyle
...
00000000008fdad0 T _Servo_CssRules_AddRef
00000000008fdad0 T _Servo_MediaList_AddRef
00000000008fdaf0 T _Servo_CssRules_Release
...
000000000062f790 T _Servo_SharedMemoryBuilder_Drop
000000000062f790 T _Servo_UseCounters_Drop
000000000062f7a0 T _Servo_CloneBasicShape
...
00000000008fdd60 T _Servo_AnimationValue_AddRef
00000000008fdd60 T _Servo_CounterStyleRule_AddRef
00000000008fdd80 T _Servo_AnimationValue_Release

I think there must be some off-by-one error somewhere in ld64 itself based on this: the object files are not displaying the issues that the final XUL library is showing. We also see the pattern: the first two symbols have the same address and the third symbol has a different address...but the second and third symbols wind up (wrongly) having the same address in the final output file, whereas the first symbol has a distinct address in the output file.

We will need a fixed linker to enable this on Mac, but the fix is pretty simple and relatively easily justified.

The Mac linker calls symbols "atoms" and maintains a bunch of state hanging off different kinds of atoms. One of those bits (literally) of state tracks whether one atom is an alias of another:

https://github.com/tpoechtrager/cctools-port/blob/master/cctools/ld64/src/ld/ld.hpp#L1173

Alias information is used in a couple of different places, notably in ordering the atoms for final emission into the object file:

https://github.com/tpoechtrager/cctools-port/blob/master/cctools/ld64/src/ld/passes/order.cpp#L167-L211

You may have noticed that said bit is only enough space to store whether an atom is an alias of another, and not which atom is being aliased. You might imagine that this would be a problem, and you would be correct; we'll come back to this issue.

When we're doing LTO with the Mac linker, we load all of the LLVM bitcode and unleash LLVM on said bitcode. This step generates Mach-O files in memory, which then linker parses as a separate step, to pass the compiled code through the regular linker pipeline. This arrangement also enables the linker to elegantly handle non-bitcode files (e.g. our assembly XPConnect files, various media codecs, etc.).

(Side note: it might be worth exploring, in a cross-LTO setup, if we could just have Rust generate a bunch of stupid, unoptimized LLVM bitcode, which we would rely on the cross-language LTO side of things to clean up. Maybe this would make things faster--don't re-optimize already-optimized Rust bitcode--or maybe not.)

Under this model, when we load the Mach-O file:

https://github.com/tpoechtrager/cctools-port/blob/master/cctools/ld64/src/ld/parsers/lto_file.cpp#L940-L971

we want to keep track of two different sets of atoms: the atoms uncovered by the LLVM compilation, and the atoms contained in the Mach-O file itself. Maintaining correspondence between those sets is the responsibility of the AtomSyncer in the code fragment above. For our purposes, the only case of atoms from the Mach-O file we care about is this case:

https://github.com/tpoechtrager/cctools-port/blob/master/cctools/ld64/src/ld/parsers/lto_file.cpp#L1524-L1527

At this point, we have a Mach-O atom with some $NAME, and we also know there's an LLVM atom with an identical $NAME. We're going to tell the LLVM atom that its "compiled atom" is the Mach-O atom:

https://github.com/tpoechtrager/cctools-port/blob/master/cctools/ld64/src/ld/parsers/lto_file.cpp#L604-L613

You'll notice that one of the lines in the above fragment says: "update fields in ld::Atom to match newly constructed mach-o atom", which gets us to here:

https://github.com/tpoechtrager/cctools-port/blob/master/cctools/ld64/src/ld/ld.hpp#L1135-L1152

You may notice that this block of code does not copy over the alias bit. (I confess I'm not quite sure why this matters; the alias information is explicit in the LLVM bitcode, so one might think it should be associated with the LLVM atom, but it apparently doesn't make it into the actual atom data structures?)

We now have enough information to describe what happens:

  1. The LTO-compiled object files have three symbols, S1, S2, and S3 in that order; S1 is the "real" symbol and S2 is the alias to S1.
  2. As a result, S1 has a non-zero size (it has code associated with it), while S2's size is 0.
  3. When we load the compiled Mach-O files after LTO has taken place, S2 doesn't get marked as an alias.
  4. We order the symbols (this can be done for, e.g. optimizing load times). If S2 had been marked as an alias, it would have been sorted before S1. But that doesn't happen, because S2 is not marked as an alias. So the symbols maintain their order: S1, S2, S3.
  5. At some later point, we assign addresses to everything by looping over the symbols in order:

https://github.com/tpoechtrager/cctools-port/blob/master/cctools/ld64/src/ld/OutputFile.cpp#L313-L331

  1. We process S1 and advance offset by its size.
  2. We process S2, which has a 0 size, so offset doesn't get advanced.
  3. We process S3, which is assigned to the same address/offset as S2, which is wrong. Boom.

This doesn't happen today, because the alias information is correct all the way through the pipeline; we don't go through the bits in lto_file.cpp, above, that desynchronize the alias information (whew!).

The fix, as are all such fixes, is quite simple: we just need to copy over the _alias field in this function:

https://github.com/tpoechtrager/cctools-port/blob/master/cctools/ld64/src/ld/ld.hpp#L1135-L1152

and everything works as it should from there on out.

I haven't actually run all the tests or even pushed this to try, so it's possible there are still other bugs waiting. But a try run will have to wait until tomorrow.

...and a (mostly) green try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ade3c06bd559e255e51e1b24f12fef443654e424

I don't know what to do about the dt-wr timeouts. I'd guess that cross-language LTO might be changing the timing behavior of certain functions (??), but I would not expect the timeouts to be quite so consistent. Brian, can we just disable the entire timing-out test file, or will you have a chance to look at bug 1502301 in the next couple of days?

Flags: needinfo?(bhackett1024)
Depends on: 1562953
Depends on: 1563204

(In reply to Nathan Froyd [:froydnj] from comment #94)

...and a (mostly) green try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ade3c06bd559e255e51e1b24f12fef443654e424

I don't know what to do about the dt-wr timeouts. I'd guess that cross-language LTO might be changing the timing behavior of certain functions (??), but I would not expect the timeouts to be quite so consistent. Brian, can we just disable the entire timing-out test file, or will you have a chance to look at bug 1502301 in the next couple of days?

Don't worry about the dt-wr timeouts, these tests are tier 2 and shouldn't interfere with other development. I can look into them after things land.

Flags: needinfo?(bhackett1024)

OK, for reasons that I do not understand, the linux32 build has started failing:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4657aeec9242ab1cbb8ce60358e77532d686f0cf

(The Android x86 debug fuzzing build is weird too, but that appears to be a Java thing, rather than something wrong with the patch.)

I think libjsrust.a must be a bunch of LLVM bitcode, and linking gdb-tests is not happening with -flto=thin, so things are going sideways. I do not understand why this isn't happening on other builds, though...

Depends on: 1563378

Linux32 builds are not busted anymore, but cross-osx ones are, which is weird:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=209c35a2c0adfc2e019f917c7e7f52985a249b38

(In reply to Mike Hommey [:glandium] from comment #97)

Linux32 builds are not busted anymore, but cross-osx ones are, which is weird:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=209c35a2c0adfc2e019f917c7e7f52985a249b38

I guess this is LLVM ERROR: ThinLTO modules with incompatible triples not supported again (perhaps similar to bug 1551690). Mike is this something you can continue to look into?

Flags: needinfo?(mh+mozilla)

So it turns out bug 1551690 was not enough. We do build with clang --target=x86_64-apple-darwin, and the triple in rust LLVM-IR is x86_64-apple-darwin... but the triple in the LLVM-IR emitted by clang is... x86_64-apple-macosx10.9.0.

Flags: needinfo?(mh+mozilla)

Funnily enough clang --target=x86_64-apple-darwin --print-target-triple prints x86_64-apple-darwin. (or --print-effective-triple for that matter)

Attachment #9069029 - Attachment description: Bug 1486042 - switch all existing `MOZ_LTO` configs to use cross-language LTO; r=#build → Bug 1486042 - switch all existing `MOZ_LTO` configs to use cross-language LTO;
Attachment #9069030 - Attachment description: Bug 1486042 - default clang-cl pgo to use cross-language LTO; r=#build → Bug 1486042 - default clang-cl pgo to use cross-language LTO;
Depends on: 1566309

I confirmed it works once we upgrade to rust 1.36, which, for some reason, was forgotten in bug 1563378. Bug 1566309 will fix that.

Bug 1565757 made me realize that cross-language LTO means libgkrust.a is not a real library anymore, which means the checks we run against it silently cannot find any problem they're supposed to find. The only test I know we run is check_networking from python/mozbuild/mozbuild/action/check_binary.py.

And FWIW, Rust 1.38 upgraded LLVM to a trunk version... this is going to be fun.

(In reply to Mike Hommey [:glandium] from comment #102)

Bug 1565757 made me realize that cross-language LTO means libgkrust.a is not a real library anymore, which means the checks we run against it silently cannot find any problem they're supposed to find. The only test I know we run is check_networking from python/mozbuild/mozbuild/action/check_binary.py.

As long as the linux64 opt builds stay non-LTOed, we're good to go, though. So I'm going to land this.

Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/402b26b7e514
switch all existing `MOZ_LTO` configs to use cross-language LTO; r=dmajor
https://hg.mozilla.org/integration/autoland/rev/8ba3c1292475
default clang-cl pgo to use cross-language LTO; r=dmajor
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70
Assignee: nobody → nfroyd

(In reply to Cristina Coroiu [:ccoroiu] from comment #107)

Since this patch landed, it started to perm fail:
Central: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&selectedJob=257365891&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&searchStr=windows%2C10%2Cx64%2Copt%2Ctest-windows10-64%2Fopt-gtest-1proc%2C%28gtest%29&fromchange=e8a1932001eb18766bcf6e4c1c5fbb484b58b095

Autoland:
https://treeherder.mozilla.org/#/jobs?repo=autoland&tochange=914ff68da4e4f3e790472011ca53bf6ef457a240&fromchange=5fff2a9bf0785afbdb774c178135cc9e9ad18211&searchStr=windows%2C10%2Cx64%2Copt%2Ctest-windows10-64%2Fopt-gtest-1proc%2C%28gtest%29&selectedJob=257681107

Mike, can you please take a look?

The tests seem to be getting stuck at:

20:51:06     INFO -  TEST-START | TestPACMan.WhenTheDHCPResponseIsEmptyWPADDefaultsToStandardURL
20:56:06    ERROR -  gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 300 seconds without output

Valentin is on PTO. :( Nhi, do you know who might be able to offer some insight about this test in Valentin's absence?

It's curious to me that this particular test is always the one that's timing out, but maybe there are some Rust URL peculiarities specific to this particular test?

Flags: needinfo?(mh+mozilla) → needinfo?(nhnguyen)
Regressions: 1567921
Regressions: 1567968
Regressions: 1568152

== Change summary for alert #21992 (as of Fri, 19 Jul 2019 17:31:28 GMT) ==

Improvements:

17% raptor-tp6-youtube-firefox loadtime windows10-64-shippable-qr opt 973.17 -> 803.88
16% raptor-tp6-youtube-firefox loadtime windows10-64-shippable opt 955.60 -> 800.38

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=21992

Regressions: 1568450

I think the regression was fixed by only enabling LTO for shippable builds, so I'm going to call comment 107 fixed.

Flags: needinfo?(nhnguyen)

Enabling LTO for shippable builds caused use to cancel the improvement gained in #110

== Change summary for alert #22207 (as of Wed, 31 Jul 2019 10:40:57 GMT) ==

Regressions:

20% raptor-tp6-youtube-firefox loadtime windows10-64-shippable opt 799.29 -> 955.71
17% raptor-tp6-youtube-firefox loadtime windows10-64-shippable-qr opt 806.48 -> 943.21
5% raptor-tp6-youtube-firefox windows10-64-shippable opt 532.82 -> 558.29
2% raptor-tp6-sheets-firefox loadtime windows10-64-shippable-qr opt 885.35 -> 905.42
2% raptor-tp6-facebook-firefox-cold loadtime windows10-64-shippable opt 830.75 -> 849.17
2% raptor-tp6-sheets-firefox loadtime windows10-64-shippable opt 881.21 -> 899.96
2% raptor-tp6-binast-instagram-firefox loadtime windows10-64-shippable-qr opt 435.08 -> 442.62

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=22207

As stated in https://bugzilla.mozilla.org/show_bug.cgi?id=1568450#c1 we will consider this cancellation acceptable

== Change summary for alert #22093 (as of Wed, 24 Jul 2019 00:33:50 GMT) ==

Regressions:

3% tabswitch windows10-64-shippable-qr opt e10s stylo 6.97 -> 7.21
2% tp5o_webext windows10-64-shippable opt e10s stylo 154.00 -> 157.60

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=22093

Stylo's Gecko wrapper duplicated some logic from the C++ side so
that the Rust compiler would be able to optimize better. Now that
we have xLTO, this kind of manual inlining should not be neccessary
anymore.

Blocks: 1577171

Comment on attachment 9088699 [details]
Bug 1486042 - Remove duplicated logic from Stylo's Gecko wrapper. r=emilio

Revision D43755 was moved to bug 1577171. Setting attachment 9088699 [details] to obsolete.

Attachment #9088699 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: