Closed Bug 1336155 Opened 7 years ago Closed 7 years ago

Update builders to rust 1.15

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox54 fixed)

RESOLVED FIXED
mozilla54
Tracking Status
firefox54 --- fixed

People

(Reporter: rillian, Assigned: rillian)

References

Details

Attachments

(3 files)

Rust 1.15.0-stable is released today. Update our integration build config to use it for Firefox 54.
Assignee: nobody → giles
Blocks: oxidation
Comment on attachment 8833017 [details]
Bug 1336155 - Update rust repack script for 1.15.0 stable.

https://reviewboard.mozilla.org/r/109254/#review110400
Attachment #8833017 - Flags: review?(mshal) → review+
Comment on attachment 8833018 [details]
Bug 1336155 - Update builders to rust 1.15.0 stable.

https://reviewboard.mozilla.org/r/109256/#review110402

These look fine to me, but it seems the linux32 builds don't agree. Any idea what's up with those?
The link is failing:

> /home/worker/workspace/build/src/gcc/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.9.4/../../../../x86_64-unknown-linux-gnu/bin/ld: read-only segment has dynamic relocations.
> collect2: error: ld returned 1 exit status

I have no insight beyond that. I can reproduce locally (with gold from fedora 25) so I don't think it's an issue with our packaged linker. Seems to be a regression from 1.14. Someday I'll get around to testing all platforms before release...

I suggest we just keep linux32 on rust 1.14 for now, removing it from the patch.
I suggest we try to find what's wrong first. In all likeliness, something is wrong in one of rust's static libraries. Push another try with -Wl,-z,text removed from LDFLAGS (it's added in old-configure.in and js/src/old-configure.in). It's going to be green, and will allow to look at the resulting libxul.so to see what ld is complaining about when -Wl,-z,text is passed. (LD sucks for not having more useful errors in that case)
Specifically, we're not in a rush to use this new rust release on the very day it's released, especially if we end up not using it on all platforms.
1.15 has stable procedural macros, which are the new and only way to derive impls for serde in its 0.9 version. My pull requests are pretty much ready to go for the serde bump. But serde_codegen doesn't exist anymore for serde 0.9, so to not block people working on WR with Gecko, we need Rust 1.15 here.

So it would be great if that bump could be done fast, I don't like serde bumps to linger for long.
(In reply to Anthony Ramine [:nox] from comment #8)
> 1.15 has stable procedural macros, which are the new and only way to derive
> impls for serde in its 0.9 version. My pull requests are pretty much ready
> to go for the serde bump. But serde_codegen doesn't exist anymore for serde
> 0.9, so to not block people working on WR with Gecko, we need Rust 1.15 here.

The suggestion in comment 5 wouldn't help you in any way.
(In reply to Mike Hommey [:glandium] from comment #9)
> The suggestion in comment 5 wouldn't help you in any way.

I'm pretty sure I didn't imply that.
FWIW I've never seen that before, much less anywhere in in the rust-lang/rust issue tracker. I'd be quite interested to learn about the fix though!

The biggest change I can think of is that we're compiling compilers with a new build system, but other than that no change leaps to mind in 1.15.0. Even that change itself I wouldn't know why would cause a bug such as this...
I've not sure what glandium was looking for in comment 6, but I pushed to try with the -Wl,-z,text ("Treat DT_TEXTREL in shared object as error.") option removed. The link succeeds, but the build subsequently fails with

> TEST-UNEXPECTED-FAIL | check_textrel | We do not want text relocations in libraries and programs

Which confirms it's a text segment relocation, but I don't see any information about which symbol has the relocation. I'll try running that check locally.

There's also an xpcshell crash which might tell us something. How do I put symbols to the stack dump in the log?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=563f566da23d&selectedJob=74379135
Gah forgot about that check. Remove it so that the build doesn't fail.

The xpcshell crash is a red herring. It's a normal occurrence, and the log viewer is just confused by it.
For posterity, here's how you check those things:
- Get the address of the end of the executable segment(s)
  e.g. readelf -lW libxul.so | awk '$1 == "LOAD" && $7 != "RW" {a=$3;b=$6}END{print strtonum(a)+strtonum(b)}'
  in the build I looked at, it's 61803364 (decimal, 0x3af0b64 hex)
- Find all relocations in the executable segment(s)
  e.g. readelf -r libxul.so | awk 'strtonum("0x" $1) <= 61803364 && NF >= 3 && NF < 6'
  in the build I looked at, that returns the following:

02af13cf  00000008 R_386_RELATIVE   
02af1505  00000008 R_386_RELATIVE   
02af150f  00000008 R_386_RELATIVE   
02af18cb  00000008 R_386_RELATIVE   
02af19f9  00000008 R_386_RELATIVE   
02af1a03  00000008 R_386_RELATIVE   
02af1d0c  00000008 R_386_RELATIVE   
02af2164  00000008 R_386_RELATIVE   
02af112b  00035002 R_386_PC32        00000000   fmax@GLIBC_2.1
02af1133  00024602 R_386_PC32        00000000   logb@GLIBC_2.0
02af11a3  000d2102 R_386_PC32        00000000   scalbn@GLIBC_2.0
02af11b7  000d2102 R_386_PC32        00000000   scalbn@GLIBC_2.0
02af11f9  000d2102 R_386_PC32        00000000   scalbn@GLIBC_2.0
02af1225  000d2102 R_386_PC32        00000000   scalbn@GLIBC_2.0
02af165e  0004eb02 R_386_PC32        00000000   fmaxf@GLIBC_2.1
02af1666  00048002 R_386_PC32        00000000   logbf@GLIBC_2.0
02af16c8  00021002 R_386_PC32        00000000   scalbnf@GLIBC_2.0
02af16dc  00021002 R_386_PC32        00000000   scalbnf@GLIBC_2.0
02af171a  00021002 R_386_PC32        00000000   scalbnf@GLIBC_2.0
02af1742  00021002 R_386_PC32        00000000   scalbnf@GLIBC_2.0

The important part is really the first column. For each of those, we can use addr2line -e libxul.so 0xaddr to see what symbol the relocation is happening in. (That requires downloading the crashreporter-symbols-full.zip and getting libxul.so.dbg from there)

Unfortunately, that only gave ??:? results, which suggests we don't have debuginfo for whatever this is happening in.

Next step is to look at the symbols table:
e.g. readelf -sW firefox/libxul.so.dbg | sort -k 2

After correlating with the addresses from the relocations, the culprits are:
118343: 02af1f90  1105 FUNC    LOCAL  DEFAULT   12 __mulsc3
151843: 02af1640   986 FUNC    LOCAL  DEFAULT   12 __divsc3
240176: 02af1b30  1113 FUNC    LOCAL  DEFAULT   12 __muldc3
240995: 02af10e0  1117 FUNC    LOCAL  DEFAULT   12 __divdc3

Those are things that come from libgcc.a. If I look at current linux32 nightly, those symbols are also there too. But I'm pretty sure before we were using rust, we were pulling them from libgcc.so.

Looking at the rust release:
readelf -sW rust-1.15.0-i686-unknown-linux-gnu/rust-std-i686-unknown-linux-gnu/lib/rustlib/i686-unknown-linux-gnu/lib/libcompiler_builtins-b4ef8c9d93bc879a.rlib  | grep '__\(div\|mul\)[ds]c3'
    10: 00000000  1117 FUNC    GLOBAL HIDDEN     4 __divdc3
    10: 00000000   986 FUNC    GLOBAL HIDDEN     4 __divsc3
    10: 00000000  1113 FUNC    GLOBAL HIDDEN     4 __muldc3
    10: 00000000  1105 FUNC    GLOBAL HIDDEN     4 __mulsc3

They were there in 1.14:
readelf -sW rust-1.14.0-i686-unknown-linux-gnu/rust-std-i686-unknown-linux-gnu/lib/rustlib/i686-unknown-linux-gnu/lib/libcompiler_builtins-f5a209a9.rlib  | grep '__\(div\|mul\)[ds]c3'
    21: 00000000  1189 FUNC    GLOBAL HIDDEN     2 __divdc3
    21: 00000000  1077 FUNC    GLOBAL HIDDEN     2 __divsc3
    20: 00000000  1129 FUNC    GLOBAL HIDDEN     2 __muldc3
    20: 00000000  1121 FUNC    GLOBAL HIDDEN     2 __mulsc3

Note the size differences.

So, to me, there are two problems here:
- In all likeliness, the entire libcompiler_builtins library is not built with -fPIC in 1.15.
- It looks like libcompiler_builtins is essentially libgcc, and it feels like we should be using libgcc instead.
> But I'm pretty sure before we were using rust, we were pulling them from libgcc.so.

Confirmed with 45.7.0esr:

$ readelf --dyn-sym firefox/libxul.so | grep '__\(div\|mul\)[ds]c3'
   383: 00000000     0 FUNC    GLOBAL DEFAULT  UND __muldc3@GCC_4.0.0 (23)
  1545: 00000000     0 FUNC    GLOBAL DEFAULT  UND __divsc3@GCC_4.0.0 (23)
  3588: 00000000     0 FUNC    GLOBAL DEFAULT  UND __divdc3@GCC_4.0.0 (23)
  3622: 00000000     0 FUNC    GLOBAL DEFAULT  UND __mulsc3@GCC_4.0.0 (23)
Thanks for the investigation! Apparently me from a long time ago [1] thought that 32-bit didn't need -fPIC, but I can't remember why at this point. I've re-enabled this [2], published a new version, and send a PR upstream [3]. Once that lands we can backport to beta as well. We've also already got a 1.15.1 point release coming out soon, so if urgent we can put this in there as well.

[1]: https://github.com/alexcrichton/gcc-rs/commit/362bdf20
[2]: https://github.com/alexcrichton/gcc-rs/commit/7d37a9a6
[3]: https://github.com/rust-lang/rust/pull/39523
(In reply to Alex Crichton [:acrichto] from comment #21)
> We've also already got a 1.15.1 point release coming out soon,
> so if urgent we can put this in there as well.

Not exactly urgent but we’d like to upgrade Servo/Stylo/WebRender to serde 0.9 with Macros 1.1, and doing so without waiting for 1.16 (and of course without breaking Firefox) would be nice.
Can we do something on Firefox's end to use the system libgcc on Linux, instead of that rlib?
Flags: needinfo?(acrichton)
Unfortunately I don't think there's a way to say "don't link compiler-rt". In general that's not expected to work in all situations anyway, but it's likely to work for "standard" situations.
Flags: needinfo?(acrichton)
Blocks: 1337153
Attachment #8833428 - Flags: review?(mshal)
I'm doing a custom build of 1.15.0 with the -fPIC change applied. This version doesn't work because I forgot I can't use my normal build environment because of GLIBC version conflicts with the build machines. I'll redo in the docker container and update the patch again.
(In reply to Ralph Giles (:rillian) | needinfo me from comment #28)
> I'm doing a custom build of 1.15.0 with the -fPIC change applied. This
> version doesn't work because I forgot I can't use my normal build
> environment because of GLIBC version conflicts with the build machines. I'll
> redo in the docker container and update the patch again.

We should wait for 1.15.1, assuming it gets the fix.
1.15.1 will not have that fix. We learned about this right now, and the release is going out right now.
Note that currently we're on track to release 1.15.1 today and the PR to fix this change (https://github.com/rust-lang/rust/pull/39523) hasn't landed yet on master. We can likely consider a 1.15.2 release, but if Gecko's got custom builds locally anyway perhaps the patch could be applied locally for now? We can be sure to get this fix in for 1.16.0 regardless, though.
We're holding of on 1.15.1 to gather more information about this bug from glandium.
Mike, do you know why Firefox's build hits this but we haven't seen it elsewhere? Is there some linker setting we don't have coverage for?
I ask partially because I don't understand the likely impact on this bug for non-Firefoxes.
I'm not glandium, but per above, we pass -Wl,-z,text to the linker, which is "Treat DT_TEXTREL in shared object as error." Adding that to a test build would give you coverage. We also do a separate pass to check the linker output for the same issue. I don't understand the impact though.
(In reply to Brian Anderson from comment #33)
> Mike, do you know why Firefox's build hits this but we haven't seen it
> elsewhere? Is there some linker setting we don't have coverage for?

Firefox builds a staticlib with the Rust runtime and then links that staticlib into a dynamic library with `-Wl,-z,text`, which forbids text relocations.  Are you not building something with `-Wl,-z,-text`?
Bug 388971 mentions problems with selinux. The Gentoo wiki cites load time. I guess it's both in that (a) doing the relocations every time a dynamic object is loaded takes time and (b) if the loader can't mark the pages read-only after applying the relocation table. FWIW. https://wiki.gentoo.org/wiki/Hardened/Position_Independent_Code_internals#Relocations_in_the_TEXT_segment_of_shared_libraries_used_by_dynamically_linked_executables
According to Ted, we originally added these checks because building without -fPIC on x86_64 caused breakage, and we didn't have test coverage on that platform at the time. Unlike x86_64, it's optional on i686-linux, in the sense that the loader will make it work, which explains how Rust's test coverage didn't find any issue. We kept the requirement in Firefox for the security, startup time, and run-time memory usage advantages.
Thanks for the clarifications rillian.
Comment on attachment 8833018 [details]
Bug 1336155 - Update builders to rust 1.15.0 stable.

Clearing review for now, since it sounds like we'll need 1.15.1 or later. Feel free to flag me again when it's ready.
Attachment #8833018 - Flags: review?(mshal)
Actually, the linker won't create text relocations on x86_64, so those checks weren't added for that. They were added explicitly with x86 in mind.

For the rest, Ralph's comments are pretty much covering it. Another concern is memory usage, where text pages are dirtied by being relocated, and can't be shared across processes.
We're going to release a fix for this with 1.15.1, aiming for release tomorrow. Thanks for helping us sort it out, all.
Attachment #8833018 - Flags: review?(mshal)
Attachment #8833428 - Flags: review?(mshal)
Comment on attachment 8833018 [details]
Bug 1336155 - Update builders to rust 1.15.0 stable.

Custom build works, so I'd like to go ahead and land it. Will bump to 1.15.1 when it's available.
Attachment #8833018 - Flags: review?(mshal)
Comment on attachment 8833018 [details]
Bug 1336155 - Update builders to rust 1.15.0 stable.

https://reviewboard.mozilla.org/r/109256/#review112578

Seems reasonable to me.
Attachment #8833018 - Flags: review?(mshal) → review+
Comment on attachment 8833428 [details]
Bug 1336155 - Update linux32 rust builders to use -fPIC.

https://reviewboard.mozilla.org/r/109658/#review112580
Attachment #8833428 - Flags: review?(mshal) → review+
Rust 1.15.1 is out.
Pushed by rgiles@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/668b30061518
Update rust repack script for 1.15.0 stable. r=mshal
https://hg.mozilla.org/integration/autoland/rev/8f30dbd7f51c
Update builders to rust 1.15.0 stable. r=mshal
https://hg.mozilla.org/integration/autoland/rev/62adf4084bae
Update linux32 rust builders to use -fPIC. r=mshal
Blocks: 1338311
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: