Closed Bug 1336155 Opened 8 years ago Closed 8 years ago

Update builders to rust 1.15

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox54 fixed)

RESOLVED FIXED
mozilla54
Tracking Status
firefox54 --- fixed

People

(Reporter: rillian, Assigned: rillian)

References

Details

Attachments

(3 files)

Rust 1.15.0-stable is released today. Update our integration build config to use it for Firefox 54.
Assignee: nobody → giles
Blocks: oxidation
Attachment #8833017 - Flags: review?(mshal) → review+
Comment on attachment 8833018 [details] Bug 1336155 - Update builders to rust 1.15.0 stable. https://reviewboard.mozilla.org/r/109256/#review110402 These look fine to me, but it seems the linux32 builds don't agree. Any idea what's up with those?
The link is failing: > /home/worker/workspace/build/src/gcc/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.9.4/../../../../x86_64-unknown-linux-gnu/bin/ld: read-only segment has dynamic relocations. > collect2: error: ld returned 1 exit status I have no insight beyond that. I can reproduce locally (with gold from fedora 25) so I don't think it's an issue with our packaged linker. Seems to be a regression from 1.14. Someday I'll get around to testing all platforms before release... I suggest we just keep linux32 on rust 1.14 for now, removing it from the patch.
I suggest we try to find what's wrong first. In all likeliness, something is wrong in one of rust's static libraries. Push another try with -Wl,-z,text removed from LDFLAGS (it's added in old-configure.in and js/src/old-configure.in). It's going to be green, and will allow to look at the resulting libxul.so to see what ld is complaining about when -Wl,-z,text is passed. (LD sucks for not having more useful errors in that case)
Specifically, we're not in a rush to use this new rust release on the very day it's released, especially if we end up not using it on all platforms.
1.15 has stable procedural macros, which are the new and only way to derive impls for serde in its 0.9 version. My pull requests are pretty much ready to go for the serde bump. But serde_codegen doesn't exist anymore for serde 0.9, so to not block people working on WR with Gecko, we need Rust 1.15 here. So it would be great if that bump could be done fast, I don't like serde bumps to linger for long.
(In reply to Anthony Ramine [:nox] from comment #8) > 1.15 has stable procedural macros, which are the new and only way to derive > impls for serde in its 0.9 version. My pull requests are pretty much ready > to go for the serde bump. But serde_codegen doesn't exist anymore for serde > 0.9, so to not block people working on WR with Gecko, we need Rust 1.15 here. The suggestion in comment 5 wouldn't help you in any way.
(In reply to Mike Hommey [:glandium] from comment #9) > The suggestion in comment 5 wouldn't help you in any way. I'm pretty sure I didn't imply that.
FWIW I've never seen that before, much less anywhere in in the rust-lang/rust issue tracker. I'd be quite interested to learn about the fix though! The biggest change I can think of is that we're compiling compilers with a new build system, but other than that no change leaps to mind in 1.15.0. Even that change itself I wouldn't know why would cause a bug such as this...
I've not sure what glandium was looking for in comment 6, but I pushed to try with the -Wl,-z,text ("Treat DT_TEXTREL in shared object as error.") option removed. The link succeeds, but the build subsequently fails with > TEST-UNEXPECTED-FAIL | check_textrel | We do not want text relocations in libraries and programs Which confirms it's a text segment relocation, but I don't see any information about which symbol has the relocation. I'll try running that check locally. There's also an xpcshell crash which might tell us something. How do I put symbols to the stack dump in the log? https://treeherder.mozilla.org/#/jobs?repo=try&revision=563f566da23d&selectedJob=74379135
Gah forgot about that check. Remove it so that the build doesn't fail. The xpcshell crash is a red herring. It's a normal occurrence, and the log viewer is just confused by it.
For posterity, here's how you check those things: - Get the address of the end of the executable segment(s) e.g. readelf -lW libxul.so | awk '$1 == "LOAD" && $7 != "RW" {a=$3;b=$6}END{print strtonum(a)+strtonum(b)}' in the build I looked at, it's 61803364 (decimal, 0x3af0b64 hex) - Find all relocations in the executable segment(s) e.g. readelf -r libxul.so | awk 'strtonum("0x" $1) <= 61803364 && NF >= 3 && NF < 6' in the build I looked at, that returns the following: 02af13cf 00000008 R_386_RELATIVE 02af1505 00000008 R_386_RELATIVE 02af150f 00000008 R_386_RELATIVE 02af18cb 00000008 R_386_RELATIVE 02af19f9 00000008 R_386_RELATIVE 02af1a03 00000008 R_386_RELATIVE 02af1d0c 00000008 R_386_RELATIVE 02af2164 00000008 R_386_RELATIVE 02af112b 00035002 R_386_PC32 00000000 fmax@GLIBC_2.1 02af1133 00024602 R_386_PC32 00000000 logb@GLIBC_2.0 02af11a3 000d2102 R_386_PC32 00000000 scalbn@GLIBC_2.0 02af11b7 000d2102 R_386_PC32 00000000 scalbn@GLIBC_2.0 02af11f9 000d2102 R_386_PC32 00000000 scalbn@GLIBC_2.0 02af1225 000d2102 R_386_PC32 00000000 scalbn@GLIBC_2.0 02af165e 0004eb02 R_386_PC32 00000000 fmaxf@GLIBC_2.1 02af1666 00048002 R_386_PC32 00000000 logbf@GLIBC_2.0 02af16c8 00021002 R_386_PC32 00000000 scalbnf@GLIBC_2.0 02af16dc 00021002 R_386_PC32 00000000 scalbnf@GLIBC_2.0 02af171a 00021002 R_386_PC32 00000000 scalbnf@GLIBC_2.0 02af1742 00021002 R_386_PC32 00000000 scalbnf@GLIBC_2.0 The important part is really the first column. For each of those, we can use addr2line -e libxul.so 0xaddr to see what symbol the relocation is happening in. (That requires downloading the crashreporter-symbols-full.zip and getting libxul.so.dbg from there) Unfortunately, that only gave ??:? results, which suggests we don't have debuginfo for whatever this is happening in. Next step is to look at the symbols table: e.g. readelf -sW firefox/libxul.so.dbg | sort -k 2 After correlating with the addresses from the relocations, the culprits are: 118343: 02af1f90 1105 FUNC LOCAL DEFAULT 12 __mulsc3 151843: 02af1640 986 FUNC LOCAL DEFAULT 12 __divsc3 240176: 02af1b30 1113 FUNC LOCAL DEFAULT 12 __muldc3 240995: 02af10e0 1117 FUNC LOCAL DEFAULT 12 __divdc3 Those are things that come from libgcc.a. If I look at current linux32 nightly, those symbols are also there too. But I'm pretty sure before we were using rust, we were pulling them from libgcc.so. Looking at the rust release: readelf -sW rust-1.15.0-i686-unknown-linux-gnu/rust-std-i686-unknown-linux-gnu/lib/rustlib/i686-unknown-linux-gnu/lib/libcompiler_builtins-b4ef8c9d93bc879a.rlib | grep '__\(div\|mul\)[ds]c3' 10: 00000000 1117 FUNC GLOBAL HIDDEN 4 __divdc3 10: 00000000 986 FUNC GLOBAL HIDDEN 4 __divsc3 10: 00000000 1113 FUNC GLOBAL HIDDEN 4 __muldc3 10: 00000000 1105 FUNC GLOBAL HIDDEN 4 __mulsc3 They were there in 1.14: readelf -sW rust-1.14.0-i686-unknown-linux-gnu/rust-std-i686-unknown-linux-gnu/lib/rustlib/i686-unknown-linux-gnu/lib/libcompiler_builtins-f5a209a9.rlib | grep '__\(div\|mul\)[ds]c3' 21: 00000000 1189 FUNC GLOBAL HIDDEN 2 __divdc3 21: 00000000 1077 FUNC GLOBAL HIDDEN 2 __divsc3 20: 00000000 1129 FUNC GLOBAL HIDDEN 2 __muldc3 20: 00000000 1121 FUNC GLOBAL HIDDEN 2 __mulsc3 Note the size differences. So, to me, there are two problems here: - In all likeliness, the entire libcompiler_builtins library is not built with -fPIC in 1.15. - It looks like libcompiler_builtins is essentially libgcc, and it feels like we should be using libgcc instead.
> But I'm pretty sure before we were using rust, we were pulling them from libgcc.so. Confirmed with 45.7.0esr: $ readelf --dyn-sym firefox/libxul.so | grep '__\(div\|mul\)[ds]c3' 383: 00000000 0 FUNC GLOBAL DEFAULT UND __muldc3@GCC_4.0.0 (23) 1545: 00000000 0 FUNC GLOBAL DEFAULT UND __divsc3@GCC_4.0.0 (23) 3588: 00000000 0 FUNC GLOBAL DEFAULT UND __divdc3@GCC_4.0.0 (23) 3622: 00000000 0 FUNC GLOBAL DEFAULT UND __mulsc3@GCC_4.0.0 (23)
Thanks for the investigation! Apparently me from a long time ago [1] thought that 32-bit didn't need -fPIC, but I can't remember why at this point. I've re-enabled this [2], published a new version, and send a PR upstream [3]. Once that lands we can backport to beta as well. We've also already got a 1.15.1 point release coming out soon, so if urgent we can put this in there as well. [1]: https://github.com/alexcrichton/gcc-rs/commit/362bdf20 [2]: https://github.com/alexcrichton/gcc-rs/commit/7d37a9a6 [3]: https://github.com/rust-lang/rust/pull/39523
(In reply to Alex Crichton [:acrichto] from comment #21) > We've also already got a 1.15.1 point release coming out soon, > so if urgent we can put this in there as well. Not exactly urgent but we’d like to upgrade Servo/Stylo/WebRender to serde 0.9 with Macros 1.1, and doing so without waiting for 1.16 (and of course without breaking Firefox) would be nice.
Can we do something on Firefox's end to use the system libgcc on Linux, instead of that rlib?
Flags: needinfo?(acrichton)
Unfortunately I don't think there's a way to say "don't link compiler-rt". In general that's not expected to work in all situations anyway, but it's likely to work for "standard" situations.
Flags: needinfo?(acrichton)
Blocks: 1337153
Attachment #8833428 - Flags: review?(mshal)
I'm doing a custom build of 1.15.0 with the -fPIC change applied. This version doesn't work because I forgot I can't use my normal build environment because of GLIBC version conflicts with the build machines. I'll redo in the docker container and update the patch again.
(In reply to Ralph Giles (:rillian) | needinfo me from comment #28) > I'm doing a custom build of 1.15.0 with the -fPIC change applied. This > version doesn't work because I forgot I can't use my normal build > environment because of GLIBC version conflicts with the build machines. I'll > redo in the docker container and update the patch again. We should wait for 1.15.1, assuming it gets the fix.
1.15.1 will not have that fix. We learned about this right now, and the release is going out right now.
Note that currently we're on track to release 1.15.1 today and the PR to fix this change (https://github.com/rust-lang/rust/pull/39523) hasn't landed yet on master. We can likely consider a 1.15.2 release, but if Gecko's got custom builds locally anyway perhaps the patch could be applied locally for now? We can be sure to get this fix in for 1.16.0 regardless, though.
We're holding of on 1.15.1 to gather more information about this bug from glandium.
Mike, do you know why Firefox's build hits this but we haven't seen it elsewhere? Is there some linker setting we don't have coverage for?
I ask partially because I don't understand the likely impact on this bug for non-Firefoxes.
I'm not glandium, but per above, we pass -Wl,-z,text to the linker, which is "Treat DT_TEXTREL in shared object as error." Adding that to a test build would give you coverage. We also do a separate pass to check the linker output for the same issue. I don't understand the impact though.
(In reply to Brian Anderson from comment #33) > Mike, do you know why Firefox's build hits this but we haven't seen it > elsewhere? Is there some linker setting we don't have coverage for? Firefox builds a staticlib with the Rust runtime and then links that staticlib into a dynamic library with `-Wl,-z,text`, which forbids text relocations. Are you not building something with `-Wl,-z,-text`?
Bug 388971 mentions problems with selinux. The Gentoo wiki cites load time. I guess it's both in that (a) doing the relocations every time a dynamic object is loaded takes time and (b) if the loader can't mark the pages read-only after applying the relocation table. FWIW. https://wiki.gentoo.org/wiki/Hardened/Position_Independent_Code_internals#Relocations_in_the_TEXT_segment_of_shared_libraries_used_by_dynamically_linked_executables
According to Ted, we originally added these checks because building without -fPIC on x86_64 caused breakage, and we didn't have test coverage on that platform at the time. Unlike x86_64, it's optional on i686-linux, in the sense that the loader will make it work, which explains how Rust's test coverage didn't find any issue. We kept the requirement in Firefox for the security, startup time, and run-time memory usage advantages.
Thanks for the clarifications rillian.
Comment on attachment 8833018 [details] Bug 1336155 - Update builders to rust 1.15.0 stable. Clearing review for now, since it sounds like we'll need 1.15.1 or later. Feel free to flag me again when it's ready.
Attachment #8833018 - Flags: review?(mshal)
Actually, the linker won't create text relocations on x86_64, so those checks weren't added for that. They were added explicitly with x86 in mind. For the rest, Ralph's comments are pretty much covering it. Another concern is memory usage, where text pages are dirtied by being relocated, and can't be shared across processes.
We're going to release a fix for this with 1.15.1, aiming for release tomorrow. Thanks for helping us sort it out, all.
Attachment #8833018 - Flags: review?(mshal)
Attachment #8833428 - Flags: review?(mshal)
Comment on attachment 8833018 [details] Bug 1336155 - Update builders to rust 1.15.0 stable. Custom build works, so I'd like to go ahead and land it. Will bump to 1.15.1 when it's available.
Attachment #8833018 - Flags: review?(mshal)
Comment on attachment 8833018 [details] Bug 1336155 - Update builders to rust 1.15.0 stable. https://reviewboard.mozilla.org/r/109256/#review112578 Seems reasonable to me.
Attachment #8833018 - Flags: review?(mshal) → review+
Attachment #8833428 - Flags: review?(mshal) → review+
Rust 1.15.1 is out.
Pushed by rgiles@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/668b30061518 Update rust repack script for 1.15.0 stable. r=mshal https://hg.mozilla.org/integration/autoland/rev/8f30dbd7f51c Update builders to rust 1.15.0 stable. r=mshal https://hg.mozilla.org/integration/autoland/rev/62adf4084bae Update linux32 rust builders to use -fPIC. r=mshal
Blocks: 1338311
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: