Perform ThinLTO across C++/Rust language boundaries
Categories
(Firefox Build System :: General, defect)
Tracking
(firefox70 fixed)
Tracking | Status | |
---|---|---|
firefox70 | --- | fixed |
People
(Reporter: mwoerister, Assigned: froydnj)
References
Details
Attachments
(2 files, 2 obsolete files)
Quite a bit of work has gone into making the Rust compiler support linker-plugin based LTO [1]. At the moment, this seems to work when building Firefox locally on Windows with Clang and a nightly version of Rust -- that is, I can confirm instances of Rust functions being inlined into C++ call sites. However, in a try build David Major and I have done, no inlining seems to have happened (although we should confirm that WinDbg wasn't being too smart here somehow and used debuginfo to show an inlined call like a regular one). The builds in question are: 1. A baseline build for comparison. It uses thin (instead of full) LTO for the Rust code [2] 2. A build that adds cross-language LTO in addition to the baseline [3] The next steps are: - Confirm that there's actually something wrong with the try-build above, i.e. that no inlining is happening although it should. - Find out what the problem is. - Do proper benchmarking on Talos in order to find out if cross-language LTO is actually a win (I could not determine this conclusively with my local builds). [1] https://github.com/rust-lang/rust/issues/49879 [2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=47311259165639022843e847ca5ceaec010f76e9 [3] https://treeherder.mozilla.org/#/jobs?repo=try&revision=c5cf2d38fb922f7d3a6447cb97b5a10db1acb9f6
> 1. A baseline build for comparison. It uses thin (instead of full) LTO for the Rust code [2]
The baseline build uses full LTO. Thin LTO doesn't work without cross-lang, for reasons I don't remember. Maybe it was the DLL import stuff?
Reporter | ||
Comment 2•6 years ago
|
||
Oh right, I remember now that you told me that. Thanks for the reminder, David!
Reporter | ||
Comment 3•6 years ago
|
||
I took another look at the c5cf2d38fb922f7d3a6447cb97b5a10db1acb9f6 build above and it looks like cross-language inlining is happening for the "Windows 2012 x64 opt" build but not for the "Windows 2012 x64 pgo" build: OPT: > nsSMILCSSProperty::IsPropertyAnimatable: > 00000001829FA860: 56 push rsi > 00000001829FA861: 48 83 EC 20 sub rsp,20h > 00000001829FA865: 89 CE mov esi,ecx > 00000001829FA867: E8 24 74 DF 01 call _ZN5style10properties19animated_properties29nscsspropertyid_is_animatable17ha3d2a6ab393e9968E > 00000001829FA86C: 84 C0 test al,al PGO: > nsSMILCSSProperty::IsPropertyAnimatable: > 00000001827738E0: 56 push rsi > 00000001827738E1: 48 83 EC 20 sub rsp,20h > 00000001827738E5: 89 CE mov esi,ecx > 00000001827738E7: E8 14 7D 78 01 call Servo_Property_IsAnimatable > 00000001827738EC: 84 C0 test al,al > 00000001827738EE: 74 1D je 000000018277390D So it's likely not something on CI that's causing the problem but rather the interaction between PGO and cross-lang LTO.
Reporter | ||
Comment 4•6 years ago
|
||
Here's an update on this: - The latest tests where done with a rather recent Clang and rustc (both end of Oct 2018). Both are based on LLVM 8.0svn which is required in order for them to be compatible. - xLTO (which is what I'll call cross-language LTO from now on) seems to work just fine in conjunction with PGO builds now. - I've been able to solve the problems around combining full LTO (for all Rust code) and ThinLTO (for everything, C++ and Rust together). This means that we now have comparable configurations for evaluating performance. One of the goals of xLTO is that it hopefully allows for not duplicating logic in both Rust and C++ because either version can be called from the other without negative performance implications. With this in mind, and with help from Emilio, I've removed some code duplication in the gecko wrapper in Stylo (see https://hg.mozilla.org/try/rev/33746f477ee4). The following try-builds were benchmarked: 1. [xLTO off, manual inlining off] https://treeherder.mozilla.org/#/jobs?repo=try&revision=0f6e7cb45dbe0934e43263058d52ae15988441c9 2. [xLTO off, manual inlining ON] https://treeherder.mozilla.org/#/jobs?repo=try&revision=fa86f94963255a7bff3bc6fef741736b16ab1b92 3. [xLTO ON, manual inlining off] https://treeherder.mozilla.org/#/jobs?repo=try&revision=ca457e1b9ef3e473bc9f23de5a2e0dacb7dc8364 4. [xLTO ON, manual inlining ON] https://treeherder.mozilla.org/#/jobs?repo=try&revision=b0dba7a6cee4d2c3651341bd6af5d51576d1d953 I think some interesting comparisons are: 1. The effect of manual inlining (without xLTO) - This is interesting because it should give an idea what kind of performance difference manual inlining gives in general. It seems to have mostly positive but also some negative effects. https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=0f6e7cb45dbe0934e43263058d52ae15988441c9&newProject=try&newRevision=fa86f94963255a7bff3bc6fef741736b16ab1b92&framework=1&showOnlyConfident=1 2. The effect of xLTO (without manual inlining) - This is interesting because it shows what kind of improvements xLTO can bring in general. It looks like it slightly improves performance in many cases, while a few regress slightly. However, many of the changes could also just be noise. https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=0f6e7cb45dbe0934e43263058d52ae15988441c9&newProject=try&newRevision=ca457e1b9ef3e473bc9f23de5a2e0dacb7dc8364&framework=1&showOnlyConfident=1 3. Finally: How does a build with xLTO and no manual inlining compare to the current state (no xLTO, code duplication) - This is the most likely configuration to replace the current one (i.e. replace manual work with a compiler optimizations). It looks like only the "tabpaint-from-parent pgo e10s stylo" benchmark is affected negatively, while some others show slight improvements. https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=fa86f94963255a7bff3bc6fef741736b16ab1b92&newProject=try&newRevision=ca457e1b9ef3e473bc9f23de5a2e0dacb7dc8364&framework=1&showOnlyConfident=1 So, judging from the numbers from (3), using xLTO might be a viable option to reduce the maintenance burden of performance sensitive code crossing the Rust/C++ boundary. It would be interesting though to take a closer look at that one regression. It looks like an important one?
Comment 5•6 years ago
|
||
(In reply to Michael Woerister from comment #4) > 3. Finally: How does a build with xLTO and no manual inlining compare to the > current state (no xLTO, code duplication) - This is the most likely > configuration to replace the current one (i.e. replace manual work with a > compiler optimizations). It looks like only the "tabpaint-from-parent pgo > e10s stylo" benchmark is affected negatively, while some others show slight > improvements. Mike, when looking at the effects of cross-language LTO we're seeing a regression in just tabpaint-parent [1]. Do you have any thoughts on how to look into this further and whether or not that would be an acceptable regression? [1] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=fa86f94963255a7bff3bc6fef741736b16ab1b92&newProject=try&newRevision=ca457e1b9ef3e473bc9f23de5a2e0dacb7dc8364&framework=1&showOnlyConfident=1
Comment 6•6 years ago
|
||
So, just to re-synthesize, the change being proposed is link-time optimization, and when you compare against mozilla-central without link-time optimization, you're seeing a regression in Windows 10 tabpaint regression but only on PGO. I'm afraid this has all of the hallmarks of something that I'm not really equipped to diagnose. :/ PGO-only regressions are notoriously difficult to get to the root cause of, at least in my books. I know that some of the folks working with vchin have been looking at PGO optimizations lately... perhaps they'd have more insight here? Transferring ni? to acreskey.
Reporter | ||
Comment 7•6 years ago
|
||
To clarify, the "tabpaint opt e10s stylo" (i.e. non-pgo) benchmark shows roughly the same regression. It's just filtered out in the given link because of lower confidence. Looking at the graph [1], the benchmarks seems to be bi-modal. So the regression might not be a real one? [1] https://treeherder.mozilla.org/perf.html#/graphs?highlightedRevisions=fa86f9496325&highlightedRevisions=ca457e1b9ef3&series=%5B%22try%22,%22a59bffa7d0919fe8fc8b4b4aca26f3b8b81de87d%22,1,1%5D&timerange=2592000
Comment 9•6 years ago
|
||
But it's been suggested that I look into the replicates, let me take a look.
Comment 10•6 years ago
|
||
(In reply to Michael Woerister from comment #7) > https://treeherder.mozilla.org/perf.html#/ > graphs?highlightedRevisions=fa86f9496325&highlightedRevisions=ca457e1b9ef3&se > ries=%5B%22try%22,%22a59bffa7d0919fe8fc8b4b4aca26f3b8b81de87d%22,1, > 1%5D&timerange=2592000 Note that it's generally not a great idea to look at performance graphs on try, because they depend very much on what kind of things is pushed to try, and *very* regularly, sheriffs push mozilla-central with the beta configuration, which has very different performance characteristics. That alone could account for bimodality of performance graphs. That being said, in this particular case, the same graph, but for mozilla-inbound is just as bimodal.
Comment 11•6 years ago
|
||
(In reply to Michael Woerister from comment #4) > 3. Finally: How does a build with xLTO and no manual inlining compare to the > current state (no xLTO, code duplication) - This is the most likely > configuration to replace the current one (i.e. replace manual work with a > compiler optimizations). It looks like only the "tabpaint-from-parent pgo > e10s stylo" benchmark is affected negatively, while some others show slight > improvements. > https://treeherder.mozilla.org/perf.html#/ > compare?originalProject=try&originalRevision=fa86f94963255a7bff3bc6fef741736b > 16ab1b92&newProject=try&newRevision=ca457e1b9ef3e473bc9f23de5a2e0dacb7dc8364& > framework=1&showOnlyConfident=1 Is there a speedometer number for this comparison? Did we validate whether xLTO removes some code duplication? Looking at the benchmark that shows the regression it appears that xLTO causes tabpaint-from-parent to exhibit the higher bimodal behaviour more often. Is that common for this benchmark?
Reporter | ||
Comment 12•6 years ago
|
||
> Did we validate whether xLTO removes some code duplication?
I verified that xLTO has the expected effect, i.e. that function calls get inlined across language boundaries.
Comment 13•6 years ago
|
||
It looks like the artifacts and logs for this task have expired so I don't have access to the replicates. Given the noisiness of this particular test, I certainly can't discern a regression there.
Updated•6 years ago
|
Comment 14•5 years ago
|
||
Strictly speaking, I guess I landed using the wrong bug, but in any case this work was done as part of bug 1512723.
Comment 15•5 years ago
|
||
Let's use this bug to enable it on the full set of platforms.
Comment 16•5 years ago
|
||
1.34 hit release, can we enable on Linux and Mac now?
Comment 17•5 years ago
|
||
I believe so. (It was my understanding that glandium would do that so I haven't been paying too close attention to the details.)
Comment 18•5 years ago
|
||
(In reply to David Major [:dmajor] (low availability) from comment #17)
I believe so. (It was my understanding that glandium would do that so I haven't been paying too close attention to the details.)
glandium, is there anything blocking us from enabling on all platforms now or can we update to 1.34 everywhere?
Comment 19•5 years ago
|
||
We switched to 1.34 on all platforms, already. The same as bug 1512723 can be done on other platforms (i.e. adjusting mozconfigs)
A generic long term way to support this is another matter. It's on my todo list, but not currently at the top.
Comment 20•5 years ago
|
||
Comment 21•5 years ago
|
||
Comment 22•5 years ago
•
|
||
Updated to exclude linux64-aarch64 and win32, win64-aarch64 and osx updated to gate off of MOZ_LTO
:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9af660f828a281acaac9e150cbad0f855b373c22
Nathan, there's a linker error on linux32. I'm not sure it's worth looking into or if we should just exclude linux32 from xLTO, but you might want to be aware. I'm going on PTO for a bit, can you help finish this off if there need to be other changes?
Comment 23•5 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #22)
Updated to exclude linux64-aarch64 and win32, win64-aarch64 and osx updated to gate off of
MOZ_LTO
:https://treeherder.mozilla.org/#/jobs?repo=try&revision=9af660f828a281acaac9e150cbad0f855b373c22
Nathan, there's a linker error on linux32. I'm not sure it's worth looking into or if we should just exclude linux32 from xLTO, but you might want to be aware. I'm going on PTO for a bit, can you help finish this off if there need to be other changes?
Linux32 and OSX both have geckodriver issues, so we probably need to look into that.
glandium, it looks like win64-aarch64 is upset that we're specifying RUSTFLAGS
. You're most familiar with this setup, can you take a look at it? I'm assuming it has something to do with repackaging.
Comment 24•5 years ago
|
||
You need to set RUSTFLAGS before build/mozconfig.no-compile resets it.
Assignee | ||
Comment 25•5 years ago
|
||
The aarch64 windows thing is trivial (need to check USE_ARTIFACT
before setting RUSTFLAGS
). The linux32 and OS X cases are a little more interesting.
The Rust backend adds the -plugin-opt
arguments that ld
is complaining about:
Reading through the binutils ld code, it looks like the -plugin-opt
parsing code expects whatever invoked ld
to have already provided -plugin
; if there's no plugin registered, that's the only way -plugin-opt
parsing can fail. -plugin
isn't being provided on linux32 and OS X, so presumably that's why things are failing.
Rust's documentation says that things magically work with lld
and that you need to be explicit about some other linker plugin if you're using a different linker (-Clinker-plugin-lto=/path/to/file
). That would explain why things work on Windows. But I thought we didn't use lld
in automation on our Linux platforms (or OS X, for that matter), so I don't know why the Linux x86-64 builds are working.
And...both the linux32 and OS X builds successfully linked gkrust (with -Clinker-plugin-lto
), which should have failed given the above, since we're not using lld
in either case.
So why are only host binaries the problem here? I would have expected that:
https://searchfox.org/mozilla-central/source/config/makefiles/rust.mk#133-138
meant that we didn't pass down a set RUSTFLAGS
to host binaries, but that does not seem to be the case?
Comment 26•5 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #25)
The aarch64 windows thing is trivial (need to check
USE_ARTIFACT
before settingRUSTFLAGS
).
Or comment 24.
The linux32 and OS X cases are a little more interesting.
The Rust backend adds the
-plugin-opt
arguments thatld
is complaining about:Reading through the binutils ld code, it looks like the
-plugin-opt
parsing code expects whatever invokedld
to have already provided-plugin
; if there's no plugin registered, that's the only way-plugin-opt
parsing can fail.-plugin
isn't being provided on linux32 and OS X, so presumably that's why things are failing.Rust's documentation says that things magically work with
lld
and that you need to be explicit about some other linker plugin if you're using a different linker (-Clinker-plugin-lto=/path/to/file
). That would explain why things work on Windows. But I thought we didn't uselld
in automation on our Linux platforms (or OS X, for that matter), so I don't know why the Linux x86-64 builds are working.And...both the linux32 and OS X builds successfully linked gkrust (with
-Clinker-plugin-lto
), which should have failed given the above, since we're not usinglld
in either case.
gkrust is also not linked by ld, since it's a static library.
So why are only host binaries the problem here? I would have expected that:
https://searchfox.org/mozilla-central/source/config/makefiles/rust.mk#133-138
meant that we didn't pass down a set
RUSTFLAGS
to host binaries, but that does not seem to be the case?
The problem for linux32 is that it's a 1-tier PGO build, and it fails during the profile-generate pass, where LTO is disabled as far as C++ is concerned. So -flto is not passed to clang, so it doesn't add -Wl,-plugin. The x86-64 build is 3-tier PGO, so LTO is enabled in the relevant build.
On mac, the problem is that the linker is ld64, and the option to pass flags to the plugin is different. In fact, I'm not even sure there's one to pass arbitrary flags.
I guess we're back to "we shouldn't try to LTO host programs in the first place" land.
Comment 27•5 years ago
|
||
That said, it seems there's also an issue to open on the rust side. I'll come up with a small test case.
Comment 28•5 years ago
|
||
Comment 29•5 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #26)
I guess we're back to "we shouldn't try to LTO host programs in the first place" land.
or more accurately "host programs and geckodriver", because, here's the funny thing: geckodriver is not a host program (and I /guess/ the only reason for that is to have a mac geckodriver).
Assignee | ||
Comment 30•5 years ago
|
||
I went with glandium's --enable-lto=cross
idea, which was not that hard to implement, and got to:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7cf9ae47c82291c8d9fdd4bdedfd1bcc42081861
The Mac failures are because there's some kind of mismatch between what Rust thinks the target is and what Clang thinks the target is? Or what the linker thinks the target is vs. some of the bitfiles? Not totally sure.
The Win32 failure is from a bunch of non-existent symbols in the Stylo Rust/C++ glue code. Somebody is dropping an extern "C"
somewhere along the way, perhaps? I'm a little confused as to why that would happen on win32 and not win64, but perhaps people haven't really pushed on LTO on win32...
Assignee | ||
Comment 31•5 years ago
|
||
I am not sure what's going on yet. emilio and dmajor pointed out that win32 needs _
prefixes, whereas win64 does not, so that would explain why win64 works and win32 does not. And emilio pointed out that bindgen uses the sekret \u{1]
prefix on external symbols so LLVM doesn't touch the mangling, which also seems involved somehow (since we use #[link_name="\u{1}$NAME"]
on Rust names for C++-defined things).
But trying to write a simple Rust/C++ LTO example did not trigger the problem. A win32 build is running right now, and ideally inspecting the generated bindings will point out what I might be doing differently than the actual problem scenario.
Reporter | ||
Comment 32•5 years ago
|
||
It seems that Rust code is compiled for x86_64-apple-darwin
while C code is compiled for x86_64-apple-darwin11
. The LLVM ERROR: ThinLTO modules with incompatible triples not supported
looks like the LLVM linker plugin then refuses to mix modules for the different targets together. rustc
does not seem to have the darwin11
target: https://github.com/rust-lang/rust/tree/master/src/librustc_target/spec.
Assignee | ||
Comment 33•5 years ago
|
||
(In reply to Michael Woerister from comment #32)
It seems that Rust code is compiled for
x86_64-apple-darwin
while C code is compiled forx86_64-apple-darwin11
. TheLLVM ERROR: ThinLTO modules with incompatible triples not supported
looks like the LLVM linker plugin then refuses to mix modules for the different targets together.rustc
does not seem to have thedarwin11
target: https://github.com/rust-lang/rust/tree/master/src/librustc_target/spec.
I went through and made everything on the C++ side use (what I thought) was x86_64-apple-darwin
, or at least x86_64-darwin
. I still got errors.
I patched clang to actually inform me what the mismatched triples were and got a surprising answer: "ThinLTO modules with incompatible triples not supported: builder x86_64-unknown-macosx10.9.0 module triple x86_64-apple-darwin".
A little more investigation says that clang --target=x86_64-darwin
tags its bitcode with the x86_64-unknown-macosx10.9.0
triple. Actually, that's not quite right, because clang
will apparently pull the version out of the SDK (10.11 in our case) unless it's overriden by MACOSX_DEPLOYMENT_TARGET
, which we set to 10.9. Groveling through the clang source, it looks like it has compiler magic to look at the SDK in use and extract the version from that, and put together a "real triple" that gets used. Or from MACOSX_DEPLOYMENT_TARGET
, depending.
I am not sure whether we should do the equivalent bits on the Rust side. Handling MACOSX_DEPLOYMENT_TARGET
seems kind of reasonable, but groveling through the SDK seems like a bit much. I wonder if it would work to run rustc --target=x86_64-apple-darwin -C llvm-args=--target=x86_64-unknown-macosx10.9.0
?
Assignee | ||
Comment 34•5 years ago
|
||
Or we can work around it on the C++ side by doing clang --target=blah ... -Xclang -triple -Xclang x86_64-apple-darwin
, which will put the right triple in the bitcode.
Assignee | ||
Comment 35•5 years ago
|
||
I filed https://github.com/rust-lang/rust/issues/60235 for the Rust side of things. The -C llvm-args
idea did not work out; I suppose we can try hacking around this with the -Xclang
hack for now?
Assignee | ||
Comment 36•5 years ago
|
||
Sigh, the -Xclang
hack works on simple examples, but with a full Firefox build, we get lots of errors about TLS not being supported on the target. So that is a dead end.
Reporter | ||
Comment 37•5 years ago
|
||
I did a more structured investigation of the x86 Windows problem and could confirm my initial suspicions:
bindgen
askslibclang
for the mangled name of C entities and then generates Rust bindings that hard-code that mangled name via the#[link_name]
attribute.- For
extern "C"
entities on 32bit Windows the mangling happens when LLVM IR is lowered to machine code, so the LLVM IR usually still contains the unmangled name. - This leads to the situation where the Rust side of things has calls to the mangled name in LLVM IR but the C side of things has definitions with the unmangled name, e.g. Rust has a call to
_Gecko_ReleaseAtom
but the C definition has the nameGecko_ReleaseAtom
. - In a non-cross-language LTO setting this works out because by the time the linker sees the names, the C side has been mangled too.
- In a cross-language LTO setting, however, the linker operates on the LLVM IR level, where the mismatch is still present.
On x64 Windows this is not a problem because no mangling happens during lowering to machine code. I suspect, however, that we'll run into a similar problem on macOS, where LLVM adds an underscore prefix to symbol names too.
I was able to fix the problem locally by modifying bindgen
to not emit #[link_name]
attributes if the mangling matched the x86 Windows mangling. That is also my suggested solution to the problem. Hopefully there is a way of implementing this robustly in bindgen
.
Assignee | ||
Comment 38•5 years ago
|
||
Emilio, do you have an opinion on the best way to solve comment 37 in bindgen?
Comment 39•5 years ago
|
||
I talked with mw this morning. I pointed him to here where we can presumably discern explicit extern "C"
vs. non-extern "C" functions in C++.
Here is where we get the mangled cursor with libclang. I'm not sure what Michael has implemented yet, but it should be around there, presumably.
Reporter | ||
Comment 40•5 years ago
|
||
What I did for my proof-of-concept fix was to do the following check:
let skip_link_name_attr = match abi {
Abi::C => mangled == format!("_{}", canonical_name),
Abi::FastCall | Abi::StdCall => panic!("not yet implemented"),
_ => false,
};
here https://github.com/rust-lang/rust-bindgen/blob/3bdfcf7992041acbc7a9278e82d6def8cbd88952/src/codegen/mod.rs#L3394, which is neither very efficient nor very elegant :)
Comment 41•5 years ago
•
|
||
If you did that like:
let skip_link_name_attr = mangled == canonical_name ||
(abi == Abi::C &&
!mangled.is_empty() &&
&mangled[1..] == canonical_name);
if !skip_link_name_attr {
attributes.push(..);
}
It'd be reasonably ok, I think. That being said... Is there any kind of C++ mangling in any target platform that doesn't have this weird backend mangling that may end up legitimately prepending just an underscore or such?
Assignee | ||
Comment 42•5 years ago
|
||
Cross-language LTO works for Mac with a motley stack of patches and a bleeding-edge rustc
:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=06e47b0d019ca2b5268a03e70d0e0db8ac8bb87f
Mac has the same prefixing convention as win32 does, but I believe the underscores actually wind up in the bitcode files in all cases (that is, they're not implicit as they are on win32), so we don't wind up with the same set of issues as we have on win32.
Comment 43•5 years ago
|
||
summary |
Note: Michael has a PR up to fix the Win32 issue.
So I think what's left is:
- Bug 1546438 needs to stick (currently on autoland)
- My patch needs to be replaced with Nathan's patch stack (possibly split out as blockers)
- mw's PR needs to land
- A new version of bindgen needs to be released, we need to update to it
- Wait for the rust fix to hit beta (1.36 goes to beta on May 23rd I believe)
The last point isn't clear to me. Can we switch to nightly rust for our nightly Firefox builds?
Comment 44•5 years ago
|
||
For reference the rust fix in point 5 is the issue referred to in comment 35.
Updated•5 years ago
|
Assignee | ||
Comment 45•5 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #43)
- Wait for the rust fix to hit beta (1.36 goes to beta on May 23rd I believe)
The last point isn't clear to me. Can we switch to nightly rust for our nightly Firefox builds?
This would be an interesting change to make.
The Rust side fix has been nominated for uplift to beta 1.35, which would be usable in infra in a couple of weeks:
https://github.com/rust-lang/rust/pull/60378#issuecomment-492363692
Reporter | ||
Comment 46•5 years ago
|
||
Regarding macOS: I looked into this some more because I was surprised that it worked without the rust-bindgen patch. I verified that the leading underscore for symbol names is not present at the LLVM IR (e.g. the definition is called Gecko_ReleaseAtom
, not _Gecko_ReleaseAtom
). Rust code, on the other hand, contains calls to @"\01_Gecko_ReleaseAtom"
, i.e. with the leading underscore - so pretty much the same situation as on x86 Windows.
So why do things work then? My guess is that the linker on macOS is using the ThinLTO plugin a bit differently than LLD, in a way that hides the mismatch. I managed to make the LLVM plugin emit optimization remarks and it is full of messages like:
---- !Missed
-Pass: inline
-Name: NoDefinition
-DebugLoc: { File: 'servo/components/style/gecko_string_cache/mod.rs',
- Line: 454, Column: 16 }
-Function: _ZN4core3ptr18real_drop_in_place17h49eb718801f38d81E
-Args:
: - Callee: _Gecko_ReleaseAtom
- - String: ' will not be inlined into '
- - Caller: _ZN4core3ptr18real_drop_in_place17h49eb718801f38d81E
- DebugLoc: { File: '/rustc/4443957f272e304e083a8d98583e608d65a712aa/src/libcore/ptr.rs',
- Line: 195, Column: 0 }
- - String: ' because its definition is unavailable'
That is, it can't find a definition with the name _Gecko_ReleaseAtom
(because it would have to look under Gecko_ReleaseAtom
) and thus just ignores the call. Later the Gecko_ReleaseAtom
function is lowered to machine code with the mangled name at which point the linker can find it.
The optimization remarks also contain cases where inlining happens because the names match up (since both caller and callee are in C/C++):
---- !Passed
-Pass: inline
-Name: Inlined
-DebugLoc: { File: 'dist/include/nsAtom.h', Line: 168, Column: 9 }
:Function: Gecko_ReleaseAtom
-Args:
- - Callee: _ZN13nsDynamicAtom11GCAtomTableEv
- DebugLoc: { File: 'xpcom/ds/nsAtomTable.cpp', Line: 452, Column: 0 }
- - String: ' inlined into '
: - Caller: Gecko_ReleaseAtom
- DebugLoc: { File: 'layout/style/GeckoBindings.cpp', Line: 961,
- Column: 0 }
- - String: ' with '
- - String: '(cost='
- - Cost: '110'
- - String: ', threshold='
- - Threshold: '225'
- - String: ')'
I'll take another look once we've updated rust-bindgen
.
Comment 47•5 years ago
|
||
Latest status:
- The rust-bindgen fix landed, we've updated to the latest version in-tree
- The rust team is not going to uplift the target triple rustc fixes. Given this is for mac only that's okay, we can just wait a few days for 1.36 to hit beta and the switch mac over to using that
- The remaining blocker, bug 1551690, is pending review
Next steps:
- Michael is going to verify that mac builds are inlined as expected (background in comment 46)
- Put up the final patches to
--enable-lto=cross
Reporter | ||
Comment 48•5 years ago
|
||
I can confirm that a local macOS build with the new rust-bindgen version successfully performs cross-language LTO. LLVM optimization remarks contain many instances of C++ code being inlined into Rust code, e.g.:
--- !Passed
Pass: inline
Name: Inlined
DebugLoc: { File: '/Users/mw/gecko-dev/servo/components/style/gecko_string_cache/mod.rs',
Line: 455, Column: 16 }
Function: Servo_IsCssPropertyRecordedInUseCounter
Args:
- Callee: Gecko_ReleaseAtom
DebugLoc: { File: 'layout/style/GeckoBindings.cpp', Line: 961,
Column: 0 }
- String: ' inlined into '
- Caller: Servo_IsCssPropertyRecordedInUseCounter
DebugLoc: { File: 'servo/ports/geckolib/glue.rs', Line: 6453,
Column: 0 }
- String: ' with '
- String: '(cost='
- Cost: '160'
- String: ', threshold='
- Threshold: '225'
- String: ')'
Assignee | ||
Comment 49•5 years ago
|
||
This change is a no-op for win64 configs, as they had this feature before.
Assignee | ||
Comment 50•5 years ago
|
||
This change is a little bit of a cheat, because of course MSVC doesn't
do cross-language LTO by default, but it seems consistent.
Depends on D33317
Comment 51•5 years ago
|
||
Pushed by nfroyd@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d083a8bd98ed switch all existing `MOZ_LTO` configs to use cross-language LTO; r=dmajor https://hg.mozilla.org/integration/autoland/rev/5e85998c4d97 default clang-cl pgo to use cross-language LTO; r=dmajor
Comment 52•5 years ago
|
||
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=249494231&repo=autoland&lineNumber=978
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=249494232&repo=autoland&lineNumber=3318
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=249495313&repo=autoland&lineNumber=5087
Backout link: https://hg.mozilla.org/integration/autoland/rev/fb7b087ff3035227a0f9c6b0cf78375f32daaada
23:01:12 INFO - PROCESS-CRASH | browser/components/extensions/test/browser/browser_ext_devtools_inspectedWindow_eval_bindings.js | application crashed [@ servo_arc::Arc<T>::drop_slow + 0x80]
23:01:12 INFO - Crash dump filename: /var/folders/lv/vhwyrgjd40s972dbfck2fnk000000w/T/tmpbxgx3n.mozrunner/minidumps/0281DDD2-556E-4B8D-A3BE-7F9EB864FCA6.dmp
23:01:12 INFO - Operating system: Mac OS X
23:01:12 INFO - 10.10.5 14F27
23:01:12 INFO - CPU: amd64
23:01:12 INFO - family 6 model 69 stepping 1
23:01:12 INFO - 4 CPUs
23:01:12 INFO -
23:01:12 INFO - GPU: UNKNOWN
23:01:12 INFO -
23:01:12 INFO - Crash reason: EXC_BAD_ACCESS / EXC_I386_GPFLT
23:01:12 INFO - Crash address: 0x0
23:01:12 INFO - Process uptime: 130 seconds
23:01:12 INFO -
23:01:12 INFO - Thread 0 (crashed)
23:01:12 INFO - 0 XUL!servo_arc::Arc<T>::drop_slow + 0x80
23:01:12 INFO - rax = 0x0000000000000000 rdx = 0x0000000000000002
23:01:12 INFO - rcx = 0x000000000000001b rbx = 0x00000001346eb5d0
23:01:12 INFO - rsi = 0x0000000000800000 rdi = 0xe5e5e5e5e5e5e5e4
23:01:12 INFO - rbp = 0x00007fff5b6bea70 rsp = 0x00007fff5b6bea50
23:01:12 INFO - r8 = 0x00000001046000b8 r9 = 0x000000000000000a
23:01:12 INFO - r10 = 0x000000010c2b3c60 r11 = 0x0000000000000007
23:01:12 INFO - r12 = 0x000000012714c700 r13 = 0x000000012c4dd748
23:01:12 INFO - r14 = 0x0000000132e0af40 r15 = 0x0000000000000000
23:01:12 INFO - rip = 0x000000010a684290
23:01:12 INFO - Found by: given as instruction pointer in context
23:01:12 INFO - 1 XUL!Servo_StyleRule_GetStyle [glue.rs:5e85998c4d97a236e8eaf1aaff622a979f60d789 : 2073 + 0x21]
23:01:12 INFO - rbp = 0x00007fff5b6beaf0 rsp = 0x00007fff5b6bea80
23:01:12 INFO - rip = 0x000000010a703fd3
23:01:12 INFO - Found by: previous frame's frame pointer
23:01:12 INFO - 2 XUL!mozilla::dom::CSSSupportsRule::CSSSupportsRule(RefPtr<RawServoSupportsRule>, mozilla::StyleSheet*, mozilla::css::Rule*, unsigned int, unsigned int) [CSSSupportsRule.cpp:5e85998c4d97a236e8eaf1aaff622a979f60d789 : 23 + 0x2d]
23:01:12 INFO - rbp = 0x00007fff5b6beb40 rsp = 0x00007fff5b6beb00
23:01:12 INFO - rip = 0x00000001081e9fdd
23:01:12 INFO - Found by: previous frame's frame pointer
23:01:12 INFO - 3 XUL!mozilla::ServoCSSRuleList::GetRule(unsigned int) [ServoCSSRuleList.cpp:5e85998c4d97a236e8eaf1aaff622a979f60d789 : 81 + 0x8]
23:01:12 INFO - rbp = 0x00007fff5b6bebb0 rsp = 0x00007fff5b6beb50
23:01:12 INFO - rip = 0x0000000108224bb1
23:01:12 INFO - Found by: previous frame's frame pointer
23:01:12 INFO - 4 XUL!mozilla::dom::CSSRuleList_Binding::DOMProxyHandler::getOwnPropDescriptor(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::PropertyKey>, bool, JS::MutableHandle<JS::PropertyDescriptor>) const [CSSRuleListBinding.cpp: : 289 + 0x5]
23:01:12 INFO - rbp = 0x00007fff5b6bec20 rsp = 0x00007fff5b6bebc0
23:01:12 INFO - rip = 0x00000001069d79ca
23:01:12 INFO - Found by: previous frame's frame pointer
23:01:12 INFO - 5 XUL!mozilla::dom::XrayResolveOwnProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<JS::PropertyKey>, JS::MutableHandle<JS::PropertyDescriptor>, bool&) [BindingUtils.cpp:5e85998c4d97a236e8eaf1aaff622a979f60d789 : 1680 + 0x6]
23:01:12 INFO - rbp = 0x00007fff5b6becb0 rsp = 0x00007fff5b6bec30
23:01:12 INFO - rip = 0x0000000107120273
23:01:12 INFO - Found by: previous frame's frame pointer
23:01:12 INFO - 6 XUL!xpc::DOMXrayTraits::resolveOwnProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<JS::PropertyKey>, JS::MutableHandle<JS::PropertyDescriptor>) [XrayWrapper.cpp:5e85998c4d97a236e8eaf1aaff622a979f60d789 : 1676 + 0x11]
23:01:12 INFO - rbp = 0x00007fff5b6bed30 rsp = 0x00007fff5b6becc0
23:01:12 INFO - rip = 0x0000000105960d5d
23:01:12 INFO - Found by: previous frame's frame pointer
23:01:12 INFO - 7 XUL!xpc::XrayWrapper<js::CrossCompartmentWrapper, xpc::DOMXrayTraits>::getOwnPropertyDescriptor(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::PropertyKey>, JS::MutableHandle<JS::PropertyDescriptor>) const [XrayWrapper.cpp:5e85998c4d97a236e8eaf1aaff622a979f60d789 : 1884 + 0x1a]
23:01:12 INFO - rbp = 0x00007fff5b6bedc0 rsp = 0x00007fff5b6bed40
23:01:12 INFO - rip = 0x000000010596088b
23:01:12 INFO - Found by: previous frame's frame pointer
23:01:12 INFO - 8 XUL!xpc::XrayWrapper<js::CrossCompartmentWrapper, xpc::DOMXrayTraits>::hasOwn(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::PropertyKey>, bool*) const [XrayWrapper.cpp:5e85998c4d97a236e8eaf1aaff622a979f60d789 : 2115 + 0x5b]
23:01:12 INFO - rbp = 0x00007fff5b6bee20 rsp = 0x00007fff5b6bedd0
23:01:12 INFO - rip = 0x00000001059638f5
23:01:12 INFO - Found by: previous frame's frame pointer
23:01:12 INFO - 9 XUL!js::ProxyGetPropertyByValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::MutableHandle<JS::Value>) [Proxy.cpp:5e85998c4d97a236e8eaf1aaff622a979f60d789 : 369 + 0x21]
23:01:12 INFO - rbp = 0x00007fff5b6beed0 rsp = 0x00007fff5b6bee30
23:01:12 INFO - rip = 0x0000000109c00b66
23:01:12 INFO - Found by: previous frame's frame pointer
Assignee | ||
Comment 53•5 years ago
|
||
Ugh. Emilio, does anything stand out to you in that crash stack?
Comment 54•5 years ago
|
||
Not really, that stack looks perfectly normal to me. :(
Though we've seen crash stacks with drop_slow
near null in the past, and I don't think we did figure out how those could really happen. The crash in CSSPageRule::~CSSPageRule
looks a bit more scary even...
Looks like it also fails some tests and there are some other crashes that may be easier to debug under Servo_DeclarationBlock_Clone
. Seems like that's panicking somewhere but I don't see the panic message anywhere:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=249494232&repo=autoland&lineNumber=3318
Comment 55•5 years ago
|
||
Emilio, with Nathan out for two weeks and mw out for two months is this something you have time to look into further? This seems to be the final blocker for landing xLTO.
Comment 56•5 years ago
|
||
So I tried to take a look at some of the minidumps from Linux, in particular this one.
I followed the docs here and tried to use minidump-2-core
, but it crashed. It looked like a simple null-check was missing so I applied this and re-built it:
diff --git a/src/tools/linux/md2core/minidump-2-core.cc b/src/tools/linux/md2core/minidump-2-core.cc
index 941586e9..72680c5f 100644
--- a/src/tools/linux/md2core/minidump-2-core.cc
+++ b/src/tools/linux/md2core/minidump-2-core.cc
@@ -1223,6 +1223,9 @@ main(int argc, const char* argv[]) {
for (unsigned i = 0; i < header->stream_count; ++i) {
const MDRawDirectory* dirent =
dump.GetArrayElement<MDRawDirectory>(header->stream_directory_rva, i);
+ if (!dirent) {
+ continue;
+ }
switch (dirent->stream_type) {
case MD_SYSTEM_INFO_STREAM:
ParseSystemInfo(options, &crashinfo, dump.Subrange(dirent->location),
@@ -1241,6 +1244,9 @@ main(int argc, const char* argv[]) {
for (unsigned i = 0; i < header->stream_count; ++i) {
const MDRawDirectory* dirent =
dump.GetArrayElement<MDRawDirectory>(header->stream_directory_rva, i);
+ if (!dirent) {
+ continue;
+ }
switch (dirent->stream_type) {
case MD_THREAD_LIST_STREAM:
ParseThreadList(options, &crashinfo, dump.Subrange(dirent->location),
But running that errors out with Cannot determine input file format.
. Gabriele do you know what the best way to debug those dumps from Linux is?
I'll try to look at the binary and compare it with the ones from the previous push in case I find something interesting.
Comment 57•5 years ago
|
||
I couldn't debug the minidump, and looking at the symbols of the FFI functions involved, I'd just note that:
- With fat lto, the compiler unifies some of the symbols to point to the same function (
Servo_StyleRule_GetStyle
andServo_SupportsRule_GetRules
end up with the same address). - With thin/cross lto, that doesn't happen, and the function looks somewhat different (as in, there are some extra instructions that are not on the fat lto build).
But I don't know in which instruction we're actually crashig because of my previous comment, so I'm not quite sure about how to go about debugging this.
Comment 58•5 years ago
|
||
Create a mozconfig with:
ac_add_options --enable-project=tools/crashreporter
then mach build
.
You'll end up with a $objdir/dist/bin/minidump_stackwalk
that you can use by giving it the path to the .dmp and the path to the unzipped target.crashreporter-symbols.zip
.
This gives this:
Crash reason: EXC_BAD_ACCESS / EXC_I386_GPFLT
Crash address: 0x0
Process uptime: 128 seconds
Thread 0 (crashed)
0 XUL!servo_arc::Arc<T>::drop_slow + 0x80
rax = 0x0000000000000000 rdx = 0x0000000000000004
rcx = 0x0000000000000001 rbx = 0x000000011f889890
rsi = 0x0000000129a00e28 rdi = 0xe5e5e5e5e5e5e5e4
rbp = 0x00007fff5d78fb40 rsp = 0x00007fff5d78fb20
r8 = 0x000000010d7000b8 r9 = 0x000000000000000a
r10 = 0x00007fff5d78fa98 r11 = 0x0000000000000000
r12 = 0x0000000122c18ca0 r13 = 0x0000000122cc0158
r14 = 0x0000000122ca0460 r15 = 0x0000000000000000
rip = 0x00000001082f2290
Found by: given as instruction pointer in context
1 XUL!Servo_StyleRule_GetStyle [glue.rs:5e85998c4d97a236e8eaf1aaff622a979f60d789 : 2073 + 0x21]
rbp = 0x00007fff5d78fbc0 rsp = 0x00007fff5d78fb50
rip = 0x0000000108371fd3
Found by: previous frame's frame pointer
Note that while doing that, I was first welcome with
minidump.cc:5213: ERROR: Minidump header signature mismatch: (0x8088b1f, 0x1f8b0808) != 0x504d444d
which sounds like what you might have hit, and that's because taskcluster likes to "transparently" gzip things. So the downloaded .dmp is actually a .gz file you need to gunzip first.
Comment 59•5 years ago
|
||
Wait, hold on, looking at that closely again, that stack actually makes no actual sense! That is this function here:
And that's not supposed to drop any Arc
at all, wth?
That into_strong
call is supposed to prevent the Arc
getting dropped by transmuting it into a Strong<T>
, which has no Drop
implementation. So that stack smells a bit like miscompilation (hopefully) or UB going bad (hopefully not).
Comment 60•5 years ago
|
||
We haven't touched minidump-2-core
for quite a while so I'm not sure if it's reliable. On the other hand minidump_stackwalk
should print out some decent stacks following the instructions Mike wrote in comment 58.
Comment 62•5 years ago
|
||
If you disassemble XUL!Servo_StyleRule_GetStyle
, is there really a call to drop_slow at offset 0x21? (Or maybe a few bytes before, if 0x21 is the address of the instruction after)
Comment 63•5 years ago
•
|
||
Yes, there is a call to drop_slow
:
(lldb) disas -a 0x0000000005b70fd3
XUL`Servo_SupportsRule_GetRules:
XUL[0x5b70f20] <+0>: pushq %rbp
XUL[0x5b70f21] <+1>: movq %rsp, %rbp
XUL[0x5b70f24] <+4>: pushq %r15
XUL[0x5b70f26] <+6>: pushq %r14
XUL[0x5b70f28] <+8>: pushq %rbx
XUL[0x5b70f29] <+9>: subq $0x58, %rsp
XUL[0x5b70f2d] <+13>: movq %rsi, %r14
XUL[0x5b70f30] <+16>: movq %rdi, %rbx
XUL[0x5b70f33] <+19>: leaq 0x1b6eb9e(%rip), %rax ; _$LT$style..global_style_data..GLOBAL_STYLE_DATA$u20$as$u20$core..ops..deref..Deref$GT$::deref::__stability::LAZY::hd2043aa93a146e3c (.llvm.15187239529951727520)
XUL[0x5b70f3a] <+26>: movq %rax, -0x28(%rbp)
XUL[0x5b70f3e] <+30>: movq 0x1b6ebab(%rip), %rax ; _$LT$style..global_style_data..GLOBAL_STYLE_DATA$u20$as$u20$core..ops..deref..Deref$GT$::deref::__stability::LAZY::hd2043aa93a146e3c (.llvm.15187239529951727520) + 24
XUL[0x5b70f45] <+37>: cmpq $0x3, %rax
XUL[0x5b70f49] <+41>: jne 0x5b70fe9 ; <+201>
XUL[0x5b70f4f] <+47>: movq -0x28(%rbp), %rax
XUL[0x5b70f53] <+51>: cmpb $0x2, 0x10(%rax)
XUL[0x5b70f57] <+55>: je 0x5b7101e ; <+254>
XUL[0x5b70f5d] <+61>: movq (%rax), %r15
XUL[0x5b70f60] <+64>: testq %r15, %r15
XUL[0x5b70f63] <+67>: je 0x5b7103b ; <+283>
XUL[0x5b70f69] <+73>: movabsq $-0x8000000000000000, %rcx ; imm = 0x8000000000000000
XUL[0x5b70f73] <+83>: xorl %eax, %eax
XUL[0x5b70f75] <+85>: lock
XUL[0x5b70f76] <+86>: cmpxchgq %rcx, 0x8(%r15)
XUL[0x5b70f7b] <+91>: jne 0x5b71047 ; <+295>
XUL[0x5b70f81] <+97>: movq (%rbx), %rax
XUL[0x5b70f84] <+100>: testq %rax, %rax
XUL[0x5b70f87] <+103>: je 0x5b71023 ; <+259>
XUL[0x5b70f8d] <+109>: cmpq %rax, %r15
XUL[0x5b70f90] <+112>: jne 0x5b71023 ; <+259>
XUL[0x5b70f96] <+118>: movq -0x8(%r14), %rax
XUL[0x5b70f9a] <+122>: addq $-0x8, %r14
XUL[0x5b70f9e] <+126>: cmpq $-0x1, %rax
XUL[0x5b70fa2] <+130>: je 0x5b70fae ; <+142>
XUL[0x5b70fa4] <+132>: lock
XUL[0x5b70fa5] <+133>: incq (%r14)
XUL[0x5b70fa8] <+136>: jle 0x5b710bc ; <+412>
XUL[0x5b70fae] <+142>: movq 0x28(%rbx), %rax
XUL[0x5b70fb2] <+146>: addq $0x28, %rbx
XUL[0x5b70fb6] <+150>: movq (%rax), %rcx
XUL[0x5b70fb9] <+153>: cmpq $-0x1, %rcx
XUL[0x5b70fbd] <+157>: je 0x5b70fd3 ; <+179>
XUL[0x5b70fbf] <+159>: lock
XUL[0x5b70fc0] <+160>: decq (%rax)
XUL[0x5b70fc3] <+163>: jne 0x5b70fd3 ; <+179>
XUL[0x5b70fc5] <+165>: movq (%rbx), %rax
XUL[0x5b70fc8] <+168>: movq (%rax), %rax
XUL[0x5b70fcb] <+171>: movq %rbx, %rdi
XUL[0x5b70fce] <+174>: callq 0x5af1210 ; servo_arc::Arc$LT$T$GT$::drop_slow::h5bfb4b1675d356ee
XUL[0x5b70fd3] <+179>: movq %r14, (%rbx)
XUL[0x5b70fd6] <+182>: movq $0x0, 0x8(%r15)
XUL[0x5b70fde] <+190>: addq $0x58, %rsp
XUL[0x5b70fe2] <+194>: popq %rbx
XUL[0x5b70fe3] <+195>: popq %r14
XUL[0x5b70fe5] <+197>: popq %r15
XUL[0x5b70fe7] <+199>: popq %rbp
XUL[0x5b70fe8] <+200>: retq
XUL[0x5b70fe9] <+201>: leaq -0x28(%rbp), %rax
XUL[0x5b70fed] <+205>: movq %rax, -0x38(%rbp)
XUL[0x5b70ff1] <+209>: leaq -0x38(%rbp), %rax
XUL[0x5b70ff5] <+213>: movq %rax, -0x68(%rbp)
XUL[0x5b70ff9] <+217>: leaq 0x1b6eaf0(%rip), %rdi ; _$LT$style..global_style_data..GLOBAL_STYLE_DATA$u20$as$u20$core..ops..deref..Deref$GT$::deref::__stability::LAZY::hd2043aa93a146e3c (.llvm.15187239529951727520) + 24
XUL[0x5b71000] <+224>: leaq 0x1b1bd71(%rip), %rdx ; anon.20480550abd40f7c5a6ab6f2e04b754b.85.llvm.15187239529951727520
XUL[0x5b71007] <+231>: leaq -0x68(%rbp), %rsi
XUL[0x5b7100b] <+235>: callq 0x5f24d50 ; std::sync::once::Once::call_inner::h3f5b185bfe61a1f0 (.llvm.15187239529951727520)
XUL[0x5b71010] <+240>: movq -0x28(%rbp), %rax
XUL[0x5b71014] <+244>: cmpb $0x2, 0x10(%rax)
XUL[0x5b71018] <+248>: jne 0x5b70f5d ; <+61>
XUL[0x5b7101e] <+254>: callq 0x5edf530 ; lazy_static::lazy::unreachable_unchecked::h77f590b606dfa998 (.llvm.15187239529951727520)
XUL[0x5b71023] <+259>: leaq 0x80d97e(%rip), %rdi ; str.C.4618 + 104
XUL[0x5b7102a] <+266>: leaq 0x1b1b447(%rip), %rdx ; anon.20480550abd40f7c5a6ab6f2e04b754b.29.llvm.15187239529951727520 + 19440
XUL[0x5b71031] <+273>: movl $0x51, %esi
XUL[0x5b71036] <+278>: callq 0x5ad5920 ; std::panicking::begin_panic::h753c026e2313fa3b (.llvm.15187239529951727520)
XUL[0x5b7103b] <+283>: leaq 0x1b268d6(%rip), %rdi ; anon.0f09dff20376d92cc45253bc08e814bf.166.llvm.15187239529951727520
XUL[0x5b71042] <+290>: callq 0x5f2bf80 ; core::panicking::panic::haf72db49750c28ab (.llvm.15187239529951727520)
XUL[0x5b71047] <+295>: xorl %ecx, %ecx
XUL[0x5b71049] <+297>: testq %rax, %rax
XUL[0x5b7104c] <+300>: setns %cl
XUL[0x5b7104f] <+303>: leaq 0x80c631(%rip), %rax ; anon.20480550abd40f7c5a6ab6f2e04b754b.26.llvm.15187239529951727520
XUL[0x5b71056] <+310>: leaq 0x80c633(%rip), %rdx ; anon.20480550abd40f7c5a6ab6f2e04b754b.27.llvm.15187239529951727520
XUL[0x5b7105d] <+317>: cmovnsq %rax, %rdx
XUL[0x5b71061] <+321>: leaq 0x7(%rcx,%rcx), %rax
XUL[0x5b71066] <+326>: movq %rdx, -0x38(%rbp)
XUL[0x5b7106a] <+330>: movq %rax, -0x30(%rbp)
XUL[0x5b7106e] <+334>: leaq -0x38(%rbp), %rax
XUL[0x5b71072] <+338>: movq %rax, -0x28(%rbp)
XUL[0x5b71076] <+342>: leaq -0x3b970d(%rip), %rax ; _$LT$$RF$T$u20$as$u20$core..fmt..Display$GT$::fmt::h0031dc61ce6c65fd (.llvm.15187239529951727520)
XUL[0x5b7107d] <+349>: movq %rax, -0x20(%rbp)
XUL[0x5b71081] <+353>: leaq 0x1b167e0(%rip), %rax ; anon.20480550abd40f7c5a6ab6f2e04b754b.25.llvm.15187239529951727520
XUL[0x5b71088] <+360>: movq %rax, -0x68(%rbp)
XUL[0x5b7108c] <+364>: movq $0x2, -0x60(%rbp)
XUL[0x5b71094] <+372>: movq $0x0, -0x58(%rbp)
XUL[0x5b7109c] <+380>: leaq -0x28(%rbp), %rax
XUL[0x5b710a0] <+384>: movq %rax, -0x48(%rbp)
XUL[0x5b710a4] <+388>: movq $0x1, -0x40(%rbp)
XUL[0x5b710ac] <+396>: leaq 0x1b167d5(%rip), %rsi ; anon.20480550abd40f7c5a6ab6f2e04b754b.29.llvm.15187239529951727520
XUL[0x5b710b3] <+403>: leaq -0x68(%rbp), %rdi
XUL[0x5b710b7] <+407>: callq 0x5f168c0 ; std::panicking::begin_panic_fmt::h50d17c3950ddfe2a (.llvm.15187239529951727520)
XUL[0x5b710bc] <+412>: callq 0x5f251d0 ; std::process::abort::h1bde3ba12adc17b8 (.llvm.15187239529951727520)
XUL[0x5b710c1] <+417>: nopw %cs:(%rax,%rax)
XUL[0x5b710cb] <+427>: nopl (%rax,%rax)
Note that the right function in that stack is Servo_SupportsRule_GetRules
, but that function is pretty similar to Servo_StyleRule_GetStyle
, so I suspect the linker has unified them. However, those functions should not drop any arc.
Looking around, it seems that the linker has also unified Servo_StyleRule_SetStyle
(which does drop the Arc<>
) too, since that symbol is at the same address. So I suspect that the LTO stuff has messed up, and has unified Servo_SupportsRule_GetRules
, Servo_StyleRule_GetStyle
, and Servo_StyleRule_SetStyle
. I think that explains those crashes. But I'm not sure I can debug further to figure out why thinlto has unified those. Any idea?
Comment 64•5 years ago
|
||
Yeah, disas -n {any of those three symbols}
ends up at the same address.
Comment 65•5 years ago
|
||
David, I'm on Linux, any idea about how I could debug this? The way I'd go as of right now is trying to dump the LLVM IR for this with RUSTFLAGS="-C save-temps"
and figure out if there's something weird.
Then I guess trying to use the same rust / llvm compiler as automation and cross-compile a mac build and try to debug the LTO stuff...
Comment 66•5 years ago
|
||
Normally when functions get unified, that's the work of the linker's ICF, not LTO per se. It might be good to confirm this. If you're pushing to tryserver, try adding ac_add_options --disable-icf
. If you're building locally, ICF ought to be disabled already, if I'm reading compiler-opts.m4 correctly. Once we know that ICF is for sure disabled, disassemble the getter and setter again and see if it's any better.
Other than that, I wonder if there's any optimization remarks that we could enable (ala Comment 46) to see what LLVM is doing with Servo_SupportsRule_GetRules
.
Comment 67•5 years ago
|
||
The push that was backed out: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4337efec72020e203957c327eddc75c2e97fa9c9
With --disable-icf
: https://treeherder.mozilla.org/#/jobs?repo=try&revision=91bd30ac9c627f2add7d2802eaf81ceb8709119b
Comment 69•5 years ago
|
||
So, the xul section sizes came out identical between the builds, meaning the icf flag had no effect. I don't see any icf=
in either of the build logs.
...aand I just remembered this is mac, so we're not using lld, so I have no familiarity at all here. Does ld64 default to icf?
Comment 70•5 years ago
•
|
||
There is a no_deduplicate
/ verbose_deduplicate
in the ld64 man page, which looks like ld64-speak for icf, and it looks like clang explicitly disables it under certain conditions, which might explain why the crashes are only in shippable/opt builds? (Disclaimer: Not a mac user, just driving by since this looks interesting and I'm anxiously waiting for xLTO)
edit: fixed link
Comment 71•5 years ago
|
||
Thanks for the pointer! Your markdown for "disables it" points back to this bug, was that supposed to be a different link?
Comment 72•5 years ago
|
||
Comment 73•5 years ago
|
||
There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:froydnj, could you have a look please?
For more information, please visit auto_nag documentation.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 74•5 years ago
|
||
Explicitly disabling ICF with ld64-specific options in the LTO setup does not seem to work:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=307546f26e77541f5855dff2046d2991e0ca1e86
which is, um, puzzling. I suppose the next step would be to see whether ICF was actually disabled by disassembling the appropriate functions in the binary.
The breakpad symbol file does say:
FUNC 5bd4e70 f6 0 Servo_StyleRule_GetStyle
FUNC 5bd4e70 1a1 0 Servo_StyleRule_SetStyle
which is, um, peculiar, and Servo_SupportsRule_GetRules
doesn't seem to be listed in the symbol file.
The breakpad symbol file from the push linked in comment 67 says:
FUNC 5b70f20 f6 0 Servo_StyleRule_GetStyle
5b70f20 1e 2077 12325
...
FUNC 5b70f20 1a1 0 Servo_StyleRule_SetStyle
5b71016 8 35 12800
...
without any references to Servo_SupportsRule_GetRules
either. Notice that those two functions are reputed to have the same start address, but very different sequences for their actual instructions + line numbers. I guess ICF there just completely botched the debug information?
But also, assuming that the information in the breakpad symbol file for each function is correct, how did those two functions get folded, since they have completely different lengths?
Comment 75•5 years ago
|
||
Fwiw, checking the most recent git history shows that in ld ~< 409.12 a custom hashing function atom_hashing::hash
(and a sameFixups
method I don't really understand) was used in dedup to compare Atoms, whereas the newest version also checks size and does an actual memcmp
. Now I don't know why this was added, but it just looked like maybe a tiny chance of a hash collision exists. Again, just poking in the dark here.
It may be more likely that there's another bug like this one. I'm not even sure if that one's fixed in the version used in automation, though the circumstances don't seem to match to what happens here.
Assignee | ||
Comment 76•5 years ago
|
||
(In reply to Johannes Pfrang [:johnp] from comment #75)
Fwiw, checking the most recent git history shows that in ld ~< 409.12 a custom hashing function
atom_hashing::hash
(and asameFixups
method I don't really understand) was used in dedup to compare Atoms, whereas the newest version also checks size and does an actualmemcmp
. Now I don't know why this was added, but it just looked like maybe a tiny chance of a hash collision exists. Again, just poking in the dark here.It may be more likely that there's another bug like this one. I'm not even sure if that one's fixed in the version used in automation, though the circumstances don't seem to match to what happens here.
That's a good point! Our cctools-port dates back to https://github.com/tpoechtrager/cctools-port/commit/8395d4b2c3350356e2fb02f5e04f4f463c7388df , so it's probably worth updating it to see if that fixes weird LTO bugs. We do have cctools-port#58 applied, but maybe the additional checks on atoms would fix some of the issues.
Assignee | ||
Comment 77•5 years ago
|
||
Assignee | ||
Comment 78•5 years ago
|
||
Well, that's disappointing: comment 77 obviously blew up, and after spending half a day getting the new version of cctools to build on our infra, we're still seeing peculiar stylo crashes:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5626f8b1cb4c40aa90fa3aa174179da945b371cc
(Note that the cctools job successfully executed on a previous push, so the results from that previous job were used--assuming I understand taskcluster correctly.)
Assignee | ||
Comment 79•5 years ago
|
||
mochitest-4 left behind crash dumps, the first of which, 4D67623C-mumble, says the crashing stack is:
0 XUL!Servo_CssRules_ListTypes [glue.rs:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 1883 + 0x49]
rax = 0x000000011264ac18 rdx = 0x0000000000000000
rcx = 0xf4005a6066f99b11 rbx = 0x00000001298ac790
rsi = 0x0000000000000001 rdi = 0x0000000000000000
rbp = 0x00007fff550b4800 rsp = 0x00007fff550b47b0
r8 = 0x0000000115f001b8 r9 = 0x000000000000000a
r10 = 0x0000000000000000 r11 = 0x0000000000000000
r12 = 0x000000011efb0500 r13 = 0x0000000000000000
r14 = 0x000000011efb04f8 r15 = 0x00000001298ac7d0
rip = 0x0000000110ab1eb9
Found by: given as instruction pointer in context
1 XUL!mozilla::dom::CSSSupportsRule::CSSSupportsRule(RefPtr<RawServoSupportsRule>, mozilla::StyleSheet*, mozilla::css::Rule*, unsigned int, unsigned int) [CSSSupportsRule.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 23 + 0xe4]
rbp = 0x00007fff550b4850 rsp = 0x00007fff550b4810
rip = 0x000000010e5896e4
Found by: previous frame's frame pointer
2 XUL!mozilla::ServoCSSRuleList::GetRule(unsigned int) [ServoCSSRuleList.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 81 + 0x8]
rbp = 0x00007fff550b48c0 rsp = 0x00007fff550b4860
rip = 0x000000010e5c4261
Found by: previous frame's frame pointer
3 XUL!mozilla::ServoStyleRuleMap::EnsureTable(mozilla::ServoStyleSet&) [ServoStyleRuleMap.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 28 + 0xb4]
rbp = 0x00007fff550b4910 rsp = 0x00007fff550b48d0
rip = 0x000000010e96b06a
Found by: previous frame's frame pointer
4 XUL!mozilla::ServoStyleSet::StyleRuleMap() [ServoStyleSet.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 1176 + 0xf]
rbp = 0x00007fff550b4960 rsp = 0x00007fff550b4920
rip = 0x000000010e5c8e6c
Found by: previous frame's frame pointer
5 XUL!mozilla::dom::InspectorUtils::GetCSSStyleRules(mozilla::dom::GlobalObject&, mozilla::dom::Element&, nsTSubstring<char16_t> const&, nsTArray<RefPtr<mozilla::BindingStyleRule> >&) [InspectorUtils.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 183 + 0x5]
rbp = 0x00007fff550b49e0 rsp = 0x00007fff550b4970
rip = 0x000000010e968670
Found by: previous frame's frame pointer
6 XUL!mozilla::dom::InspectorUtils_Binding::getCSSStyleRules(JSContext*, unsigned int, JS::Value*) [InspectorUtilsBinding.cpp: : 2863 + 0x8]
rbp = 0x00007fff550b4b50 rsp = 0x00007fff550b49f0
rip = 0x000000010d48a772
Found by: previous frame's frame pointer
7 0x2a906ae7495f
rbp = 0x00007fff550b4bb0 rsp = 0x00007fff550b4b60
rip = 0x00002a906ae7495f
Found by: previous frame's frame pointer
The second, 3A69FB35-mumble, says:
0 XUL!servo_arc::Arc<T>::drop_slow + 0x80
rax = 0x000000011b4b04f0 rdx = 0x00000001679e3f60
rcx = 0x00000000000007fd rbx = 0x00000001663fb2a8
rsi = 0x00007fff58bde2b8 rdi = 0xe5e5e5e5e5e5e5e4
rbp = 0x00007fff58bde1d0 rsp = 0x00007fff58bde1b0
r8 = 0x0000000000000000 r9 = 0x000000000000000f
r10 = 0x0000000000c60000 r11 = 0xffff80020c90d418
r12 = 0x00000001679e3f60 r13 = 0x00000001663fb288
r14 = 0x00000001663e9e50 r15 = 0x0000000000000000
rip = 0x000000010cefdaf0
Found by: given as instruction pointer in context
1 XUL!Servo_Keyframe_GetStyle + 0xb3
rbp = 0x00007fff58bde250 rsp = 0x00007fff58bde1e0
rip = 0x000000010cf9bdc3
Found by: previous frame's frame pointer
2 XUL!mozilla::dom::CSSMozDocumentRule::CSSMozDocumentRule(RefPtr<RawServoMozDocumentRule>, mozilla::StyleSheet*, mozilla::css::Rule*, unsigned int, unsigned int) [CSSMozDocumentRule.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 80 + 0x2d]
rbp = 0x00007fff58bde2a0 rsp = 0x00007fff58bde260
rip = 0x000000010aa5099d
Found by: previous frame's frame pointer
3 XUL!mozilla::ServoCSSRuleList::GetRule(unsigned int) [ServoCSSRuleList.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 82 + 0x8]
rbp = 0x00007fff58bde310 rsp = 0x00007fff58bde2b0
rip = 0x000000010aa8d2d7
Found by: previous frame's frame pointer
4 XUL!nsTreeSanitizer::SanitizeChildren(nsINode*) [CSSRuleList.h:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 28 + 0x10]
rbp = 0x00007fff58bde5f0 rsp = 0x00007fff58bde320
rip = 0x0000000108c5a72a
Found by: previous frame's frame pointer
5 XUL!mozilla::HTMLEditor::ParseFragment(nsTSubstring<char16_t> const&, nsAtom*, mozilla::dom::Document*, mozilla::dom::DocumentFragment**, bool) [HTMLEditorDataTransfer.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 2448 + 0x10]
rbp = 0x00007fff58bde640 rsp = 0x00007fff58bde600
rip = 0x000000010a9eb404
Found by: previous frame's frame pointer
6 XUL!mozilla::HTMLEditor::DoInsertHTMLWithContext(nsTSubstring<char16_t> const&, nsTSubstring<char16_t> const&, nsTSubstring<char16_t> const&, nsTSubstring<char16_t> const&, mozilla::dom::Document*, mozilla::EditorDOMPointBase<nsCOMPtr<nsINode>, nsCOMPtr<nsIContent> > const&, bool, bool, bool) [HTMLEditorDataTransfer.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 223 + 0x3e]
rbp = 0x00007fff58bdea60 rsp = 0x00007fff58bde650
rip = 0x000000010a9de4ff
Found by: previous frame's frame pointer
7 XUL!mozilla::HTMLEditor::InsertFromTransferable(nsITransferable*, mozilla::dom::Document*, nsTSubstring<char16_t> const&, nsTSubstring<char16_t> const&, bool, bool) [HTMLEditorDataTransfer.cpp:5626f8b1cb4c40aa90fa3aa174179da945b371cc : 1292 + 0x1c]
rbp = 0x00007fff58bded10 rsp = 0x00007fff58bdea70
rip = 0x000000010a9e7a0f
Found by: previous frame's frame pointer
...
Servo_Keyframe_GetStyle
appears to have been ICF'd to Servo_Keyframe_SetStyle
; again, seemingly different function sizes, so ICF should have left those alone.
emilio, does the first stack make any more sense?
Comment 80•5 years ago
|
||
It seems like the first style should be Servo_SupportsRule_GetRules
. It doesn't make any sense for that function to be unified with Servo_CSSRules_ListTypes, which writes to an array and what not: https://searchfox.org/mozilla-central/rev/7e158713cf5a8514fa8161dd4a239737b05da64d/servo/ports/geckolib/glue.rs#1879
Comment 81•5 years ago
|
||
Are we sure that the suspect address is really doing the work of a Servo_CSSRules_ListTypes
, and it's not just the debug info lying to us? (A long shot, but it would be good to explicitly rule out.)
Assignee | ||
Comment 82•5 years ago
|
||
-verbose_deduplicate
actually says there are 0 duplicated functions across all of Firefox. Which surprises me, but I think makes a certain amount of sense under LTO: LLVM actually contains a function-merging pass for doing ICF-esque things at compile-time, rather than link-time...which with LTO means that we're doing ICF not with the linker, but with the compiler that gets invoked by the linker. Which means that we're possibly dealing with a bug in the compiler, ugh.
Unfortunately the history for the MergeFunctions
pass doesn't seem to contain anything obviously illuminating in terms of bugfixes that we might be missing:
https://github.com/llvm-mirror/llvm/commits/master/lib/Transforms/IPO/MergeFunctions.cpp
Nor does the logging story for the pass inspire confidence: getting basic information out of the pass like "these two functions are getting merged" requires a debug build (!). So trying to figure out what's going on requires a debug LLVM, or at least some patches to encourage other behavior.
Oh, and for added fun, the problems could be in Rust's LLVM, too (this seems unlikely to me, but worth mentioning for completeness).
This could be wrong, of course, but it seems at least plausible.
Comment 83•5 years ago
•
|
||
Rustc auto-enables merge-funcs:
/// Determines how or whether the MergeFunctions LLVM pass should run for
/// this target. Either "disabled", "trampolines", or "aliases".
/// The MergeFunctions pass is generally useful, but some targets may need
/// to opt out. The default is "aliases".
///
/// Workaround for: https://github.com/rust-lang/rust/issues/57356
pub merge_functions: MergeFunctions,
I gave a shot at disabling that pass here, we'll see: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b9b59a888533dcdeff09591b11ae68f5412cbd46
(Edited since at first I was going to ask how to disable it but then I figured out that RUSTC_BOOTSTRAP would do the trick)
Comment 84•5 years ago
|
||
Hmm, that didn't work... Nathan, any idea why? Should I try to make osx builds in tc use rust nightly? Or am I way off? :)
Assignee | ||
Comment 85•5 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #84)
Hmm, that didn't work... Nathan, any idea why? Should I try to make osx builds in tc use rust nightly? Or am I way off? :)
I suppose you'd have to use nightly. You can modify the job itself:
https://searchfox.org/mozilla-central/source/taskcluster/ci/build/macosx.yml#294
to use the linux64-rust-nightly-macos
toolchain, or you can modify the linux64-rust-macos
toolchain:
https://searchfox.org/mozilla-central/source/taskcluster/ci/toolchain/rust.yml#123
(notice the toolchain-alias
field) to repact nightly from some date rather than beta from some date.
I am trying to build things locally to investigate as well.
Assignee | ||
Comment 86•5 years ago
|
||
Actually, I suppose you'd have to bump the linux64-rust-nightly-macos
to use a newer date, so that it picks up:
Assignee | ||
Comment 87•5 years ago
|
||
OK, so a local build says that libgkrust.a
has distinct (LLVM bitcode) code for Servo_Keyframe_{Get,Set}Style
and Servo_StyleRule_{Get,Set}Style
are distinct functions. I haven't diff'd them, but eyeballing them says they should not be merged. It does look like they could almost share prefixes of code, depending on how optimization shakes out, but that seems hard to do.
Assignee | ||
Comment 88•5 years ago
|
||
...and looking at XUL in a local build, those functions from comment 87 have distinct addresses. So how are the crashes happening and why is the breakpad symbol file such garbage?
Comment 89•5 years ago
|
||
Garbage DWARF?
Assignee | ||
Comment 90•5 years ago
|
||
OK, something is clearly going wrong. In the llvm bitcode produced by rustc
, we have:
froydnj@hawkeye:/opt/build/froydnj/build-macosx-cross/objs$ egrep 'Keyframe_...Style' gkrust.dis
@Servo_MozDocumentRule_GetRules = unnamed_addr alias i64 (%"nsstring::nsACString"*), i64 (%"nsstring::nsACString"*)* @Servo_Keyframe_GetStyle
define i64 @Servo_Keyframe_GetStyle(%"nsstring::nsACString"* noalias nocapture nonnull readonly align 1) unnamed_addr #2 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* bitcast (void ()* @GkRust_Shutdown to i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)*) !dbg !4131763 {
define void @Servo_Keyframe_SetStyle(%"nsstring::nsACString"* noalias nocapture nonnull readonly align 1, %"nsstring::nsACString"* noalias nonnull readonly align 1) unnamed_addr #2 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* bitcast (void ()* @GkRust_Shutdown to i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)*) !dbg !4131973 {
Which seems kind of reasonable (I haven't looked at the code to make sure Servo_MozDocument_GetRules
and Servo_Keyframe_GetStyle
actually access things at identical offsets). But then in the actual library, we have:
froydnj@hawkeye:/opt/build/froydnj/build-macosx-cross/objs$ ~/.mozbuild/clang/bin/llvm-nm ../toolkit/library/XUL |egrep 'Servo_'
00000000059751f0 T _Servo_Keyframe_SetStyle
00000000059750f0 T _Servo_Keyframe_GetStyle
00000000059751f0 T _Servo_MozDocumentRule_GetRules
Notice that at the bitcode IR level, we have &Servo_Keyframe_GetStyle == &Servo_MozDocumentRule_GetRules
, but by the time we make it to the final object, we have &Servo_Keyframe_SetStyle == &Servo_MozDocumentRule_GetRules
, which is, um, wrong. We can look at how many Servo_*
functions wind up deduplicated:
froydnj@hawkeye:/opt/build/froydnj/build-macosx-cross/objs$ ~/.mozbuild/clang/bin/llvm-nm ../toolkit/library/XUL |egrep 'Servo_' |sort -g -k 1 | uniq -D --check-chars=16
0000000005a03b70 T _Servo_CloneBasicShape
0000000005a03b70 T _Servo_UseCounters_Drop
0000000005cd1ec0 T _Servo_CssRules_Release
0000000005cd1ec0 T _Servo_MediaList_AddRef
0000000005cd2150 T _Servo_AnimationValue_Release
0000000005cd2150 T _Servo_CounterStyleRule_AddRef
0000000005cd2150 T _Servo_CssUrlData_AddRef
0000000005cd2150 T _Servo_DeclarationBlock_AddRef
0000000005cd2150 T _Servo_DocumentRule_AddRef
0000000005cd2150 T _Servo_FontFaceRule_AddRef
0000000005cd2150 T _Servo_FontFeatureValuesRule_AddRef
0000000005cd2150 T _Servo_ImportRule_AddRef
0000000005cd2150 T _Servo_Keyframe_AddRef
0000000005cd2150 T _Servo_KeyframesRule_AddRef
0000000005cd2150 T _Servo_MediaRule_AddRef
0000000005cd2150 T _Servo_NamespaceRule_AddRef
0000000005cd2150 T _Servo_PageRule_AddRef
0000000005cd2150 T _Servo_StyleRule_AddRef
0000000005cd2150 T _Servo_StyleSheetContents_AddRef
0000000005cd2150 T _Servo_SupportsRule_AddRef
00000000059751f0 T _Servo_Keyframe_SetStyle
00000000059751f0 T _Servo_MozDocumentRule_GetRules
0000000005956780 T _Servo_StyleRule_SetStyle
0000000005956780 T _Servo_SupportsRule_GetRules
0000000005976e10 T _Servo_NamespaceRule_GetPrefix
0000000005976e10 T _Servo_PageRule_GetStyle
Presumably Servo_StyleRule_SetStyle
and Servo_SupportsRule_GetRules
shouldn't have been folded together. Servo_CssRules_Release
should also be a distinct function from Servo_MediaList_AddRef
, I'd think? Same with Servo_AnimationValue_Release
and Servo_CounterStyleRule_AddRef
. Not sure about the Servo_CloneBasicShape
and Servo_UseCounters_Drop
equivalency.
I really hope this is an artifact of enabling cross-language LTO, and not an issue we're seeing today...
I wonder if I can convince ld64
to dump all the bitcode post-LLVM optimization, but before translation to object code and therefore, ideally, before ld64
optimizations run on the object code.
Assignee | ||
Comment 91•5 years ago
|
||
ld64
dumps all of its intermediate LLVM bitcode and post-LLVM-optimization Mach-O files (!) with -save-temps
. (If you are playing along at home, have ~40GB free to do this, on top of the space for your normal objdir.) ld64
can also be convinced to pass in options to its invoked LLVM via -mllvm $OPTION
(-Wl,-mllvm -Wl,$OPTION
from the compiler), but I have not been able to get what I think is some debugging code run by doing so, so I must be doing something wrong.
For the merged functions in comment 90, you can see in the intermediate Mach-O file that LLVM produces (these are held in memory if you're not dumping with -save-temps
) that everything is OK:
00000000005822b0 T _Servo_StyleRule_GetStyle
00000000005822b0 T _Servo_SupportsRule_GetRules
00000000005823b0 T _Servo_StyleRule_SetStyle
...
00000000005a0d20 T _Servo_Keyframe_GetStyle
00000000005a0d20 T _Servo_MozDocumentRule_GetRules
00000000005a0e20 T _Servo_Keyframe_SetStyle
...
00000000008fdad0 T _Servo_CssRules_AddRef
00000000008fdad0 T _Servo_MediaList_AddRef
00000000008fdaf0 T _Servo_CssRules_Release
...
000000000062f790 T _Servo_SharedMemoryBuilder_Drop
000000000062f790 T _Servo_UseCounters_Drop
000000000062f7a0 T _Servo_CloneBasicShape
...
00000000008fdd60 T _Servo_AnimationValue_AddRef
00000000008fdd60 T _Servo_CounterStyleRule_AddRef
00000000008fdd80 T _Servo_AnimationValue_Release
I think there must be some off-by-one error somewhere in ld64
itself based on this: the object files are not displaying the issues that the final XUL
library is showing. We also see the pattern: the first two symbols have the same address and the third symbol has a different address...but the second and third symbols wind up (wrongly) having the same address in the final output file, whereas the first symbol has a distinct address in the output file.
Assignee | ||
Comment 92•5 years ago
|
||
We will need a fixed linker to enable this on Mac, but the fix is pretty simple and relatively easily justified.
The Mac linker calls symbols "atoms" and maintains a bunch of state hanging off different kinds of atoms. One of those bits (literally) of state tracks whether one atom is an alias of another:
https://github.com/tpoechtrager/cctools-port/blob/master/cctools/ld64/src/ld/ld.hpp#L1173
Alias information is used in a couple of different places, notably in ordering the atoms for final emission into the object file:
You may have noticed that said bit is only enough space to store whether an atom is an alias of another, and not which atom is being aliased. You might imagine that this would be a problem, and you would be correct; we'll come back to this issue.
When we're doing LTO with the Mac linker, we load all of the LLVM bitcode and unleash LLVM on said bitcode. This step generates Mach-O files in memory, which then linker parses as a separate step, to pass the compiled code through the regular linker pipeline. This arrangement also enables the linker to elegantly handle non-bitcode files (e.g. our assembly XPConnect files, various media codecs, etc.).
(Side note: it might be worth exploring, in a cross-LTO setup, if we could just have Rust generate a bunch of stupid, unoptimized LLVM bitcode, which we would rely on the cross-language LTO side of things to clean up. Maybe this would make things faster--don't re-optimize already-optimized Rust bitcode--or maybe not.)
Under this model, when we load the Mach-O file:
we want to keep track of two different sets of atoms: the atoms uncovered by the LLVM compilation, and the atoms contained in the Mach-O file itself. Maintaining correspondence between those sets is the responsibility of the AtomSyncer
in the code fragment above. For our purposes, the only case of atoms from the Mach-O file we care about is this case:
At this point, we have a Mach-O atom with some $NAME, and we also know there's an LLVM atom with an identical $NAME. We're going to tell the LLVM atom that its "compiled atom" is the Mach-O atom:
You'll notice that one of the lines in the above fragment says: "update fields in ld::Atom to match newly constructed mach-o atom", which gets us to here:
https://github.com/tpoechtrager/cctools-port/blob/master/cctools/ld64/src/ld/ld.hpp#L1135-L1152
You may notice that this block of code does not copy over the alias bit. (I confess I'm not quite sure why this matters; the alias information is explicit in the LLVM bitcode, so one might think it should be associated with the LLVM atom, but it apparently doesn't make it into the actual atom data structures?)
We now have enough information to describe what happens:
- The LTO-compiled object files have three symbols, S1, S2, and S3 in that order; S1 is the "real" symbol and S2 is the alias to S1.
- As a result, S1 has a non-zero size (it has code associated with it), while S2's size is 0.
- When we load the compiled Mach-O files after LTO has taken place, S2 doesn't get marked as an alias.
- We order the symbols (this can be done for, e.g. optimizing load times). If S2 had been marked as an alias, it would have been sorted before S1. But that doesn't happen, because S2 is not marked as an alias. So the symbols maintain their order: S1, S2, S3.
- At some later point, we assign addresses to everything by looping over the symbols in order:
- We process S1 and advance
offset
by its size. - We process S2, which has a 0 size, so
offset
doesn't get advanced. - We process S3, which is assigned to the same address/offset as S2, which is wrong. Boom.
This doesn't happen today, because the alias information is correct all the way through the pipeline; we don't go through the bits in lto_file.cpp, above, that desynchronize the alias information (whew!).
The fix, as are all such fixes, is quite simple: we just need to copy over the _alias
field in this function:
https://github.com/tpoechtrager/cctools-port/blob/master/cctools/ld64/src/ld/ld.hpp#L1135-L1152
and everything works as it should from there on out.
Assignee | ||
Comment 93•5 years ago
|
||
I haven't actually run all the tests or even pushed this to try, so it's possible there are still other bugs waiting. But a try run will have to wait until tomorrow.
Assignee | ||
Comment 94•5 years ago
|
||
...and a (mostly) green try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ade3c06bd559e255e51e1b24f12fef443654e424
I don't know what to do about the dt-wr timeouts. I'd guess that cross-language LTO might be changing the timing behavior of certain functions (??), but I would not expect the timeouts to be quite so consistent. Brian, can we just disable the entire timing-out test file, or will you have a chance to look at bug 1502301 in the next couple of days?
Comment 95•5 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #94)
...and a (mostly) green try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ade3c06bd559e255e51e1b24f12fef443654e424
I don't know what to do about the dt-wr timeouts. I'd guess that cross-language LTO might be changing the timing behavior of certain functions (??), but I would not expect the timeouts to be quite so consistent. Brian, can we just disable the entire timing-out test file, or will you have a chance to look at bug 1502301 in the next couple of days?
Don't worry about the dt-wr timeouts, these tests are tier 2 and shouldn't interfere with other development. I can look into them after things land.
Assignee | ||
Comment 96•5 years ago
|
||
OK, for reasons that I do not understand, the linux32 build has started failing:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4657aeec9242ab1cbb8ce60358e77532d686f0cf
(The Android x86 debug fuzzing build is weird too, but that appears to be a Java thing, rather than something wrong with the patch.)
I think libjsrust.a
must be a bunch of LLVM bitcode, and linking gdb-tests
is not happening with -flto=thin
, so things are going sideways. I do not understand why this isn't happening on other builds, though...
Comment 97•5 years ago
|
||
Linux32 builds are not busted anymore, but cross-osx ones are, which is weird:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=209c35a2c0adfc2e019f917c7e7f52985a249b38
Comment 98•5 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #97)
Linux32 builds are not busted anymore, but cross-osx ones are, which is weird:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=209c35a2c0adfc2e019f917c7e7f52985a249b38
I guess this is LLVM ERROR: ThinLTO modules with incompatible triples not supported
again (perhaps similar to bug 1551690). Mike is this something you can continue to look into?
Comment 99•5 years ago
|
||
So it turns out bug 1551690 was not enough. We do build with clang --target=x86_64-apple-darwin
, and the triple in rust LLVM-IR is x86_64-apple-darwin
... but the triple in the LLVM-IR emitted by clang is... x86_64-apple-macosx10.9.0
.
Comment 100•5 years ago
•
|
||
Funnily enough clang --target=x86_64-apple-darwin --print-target-triple
prints x86_64-apple-darwin
. (or --print-effective-triple
for that matter)
Updated•5 years ago
|
Updated•5 years ago
|
Comment 101•5 years ago
|
||
I confirmed it works once we upgrade to rust 1.36, which, for some reason, was forgotten in bug 1563378. Bug 1566309 will fix that.
Comment 102•5 years ago
|
||
Bug 1565757 made me realize that cross-language LTO means libgkrust.a is not a real library anymore, which means the checks we run against it silently cannot find any problem they're supposed to find. The only test I know we run is check_networking
from python/mozbuild/mozbuild/action/check_binary.py
.
Comment 103•5 years ago
|
||
And FWIW, Rust 1.38 upgraded LLVM to a trunk version... this is going to be fun.
Comment 104•5 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #102)
Bug 1565757 made me realize that cross-language LTO means libgkrust.a is not a real library anymore, which means the checks we run against it silently cannot find any problem they're supposed to find. The only test I know we run is
check_networking
frompython/mozbuild/mozbuild/action/check_binary.py
.
As long as the linux64 opt builds stay non-LTOed, we're good to go, though. So I'm going to land this.
Comment 105•5 years ago
|
||
Pushed by mh@glandium.org: https://hg.mozilla.org/integration/autoland/rev/402b26b7e514 switch all existing `MOZ_LTO` configs to use cross-language LTO; r=dmajor https://hg.mozilla.org/integration/autoland/rev/8ba3c1292475 default clang-cl pgo to use cross-language LTO; r=dmajor
Comment 106•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/402b26b7e514
https://hg.mozilla.org/mozilla-central/rev/8ba3c1292475
Updated•5 years ago
|
Comment 107•5 years ago
|
||
Since this patch landed, it started to perm fail:
Central: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&selectedJob=257365891&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&searchStr=windows%2C10%2Cx64%2Copt%2Ctest-windows10-64%2Fopt-gtest-1proc%2C%28gtest%29&fromchange=e8a1932001eb18766bcf6e4c1c5fbb484b58b095
Mike, can you please take a look?
Assignee | ||
Comment 108•5 years ago
|
||
(In reply to Cristina Coroiu [:ccoroiu] from comment #107)
Since this patch landed, it started to perm fail:
Central: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&selectedJob=257365891&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&searchStr=windows%2C10%2Cx64%2Copt%2Ctest-windows10-64%2Fopt-gtest-1proc%2C%28gtest%29&fromchange=e8a1932001eb18766bcf6e4c1c5fbb484b58b095Mike, can you please take a look?
The tests seem to be getting stuck at:
20:51:06 INFO - TEST-START | TestPACMan.WhenTheDHCPResponseIsEmptyWPADDefaultsToStandardURL
20:56:06 ERROR - gtest TEST-UNEXPECTED-FAIL | gtest | timed out after 300 seconds without output
Valentin is on PTO. :( Nhi, do you know who might be able to offer some insight about this test in Valentin's absence?
It's curious to me that this particular test is always the one that's timing out, but maybe there are some Rust URL peculiarities specific to this particular test?
Comment 109•5 years ago
|
||
Made a bug for this issue: https://bugzilla.mozilla.org/show_bug.cgi?id=1567921
Comment 110•5 years ago
|
||
== Change summary for alert #21992 (as of Fri, 19 Jul 2019 17:31:28 GMT) ==
Improvements:
17% raptor-tp6-youtube-firefox loadtime windows10-64-shippable-qr opt 973.17 -> 803.88
16% raptor-tp6-youtube-firefox loadtime windows10-64-shippable opt 955.60 -> 800.38
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=21992
Assignee | ||
Comment 111•5 years ago
|
||
I think the regression was fixed by only enabling LTO for shippable builds, so I'm going to call comment 107 fixed.
Comment 112•5 years ago
|
||
Enabling LTO for shippable builds caused use to cancel the improvement gained in #110
== Change summary for alert #22207 (as of Wed, 31 Jul 2019 10:40:57 GMT) ==
Regressions:
20% raptor-tp6-youtube-firefox loadtime windows10-64-shippable opt 799.29 -> 955.71
17% raptor-tp6-youtube-firefox loadtime windows10-64-shippable-qr opt 806.48 -> 943.21
5% raptor-tp6-youtube-firefox windows10-64-shippable opt 532.82 -> 558.29
2% raptor-tp6-sheets-firefox loadtime windows10-64-shippable-qr opt 885.35 -> 905.42
2% raptor-tp6-facebook-firefox-cold loadtime windows10-64-shippable opt 830.75 -> 849.17
2% raptor-tp6-sheets-firefox loadtime windows10-64-shippable opt 881.21 -> 899.96
2% raptor-tp6-binast-instagram-firefox loadtime windows10-64-shippable-qr opt 435.08 -> 442.62
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=22207
As stated in https://bugzilla.mozilla.org/show_bug.cgi?id=1568450#c1 we will consider this cancellation acceptable
Comment 113•5 years ago
|
||
== Change summary for alert #22093 (as of Wed, 24 Jul 2019 00:33:50 GMT) ==
Regressions:
3% tabswitch windows10-64-shippable-qr opt e10s stylo 6.97 -> 7.21
2% tp5o_webext windows10-64-shippable opt e10s stylo 154.00 -> 157.60
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=22093
Reporter | ||
Comment 114•5 years ago
|
||
Stylo's Gecko wrapper duplicated some logic from the C++ side so
that the Rust compiler would be able to optimize better. Now that
we have xLTO, this kind of manual inlining should not be neccessary
anymore.
Comment 115•5 years ago
|
||
Comment on attachment 9088699 [details]
Bug 1486042 - Remove duplicated logic from Stylo's Gecko wrapper. r=emilio
Revision D43755 was moved to bug 1577171. Setting attachment 9088699 [details] to obsolete.
Description
•