Open Bug 1337955 Opened 7 years ago Updated 2 years ago

Test Firefox with Rust beta and nightly toolchains

Categories

(Firefox Build System :: General, defect)

defect

Tracking

(Not tracked)

People

(Reporter: rillian, Unassigned)

References

Details

Attachments

(1 file, 2 obsolete files)

Several times we've found a portability problem with one of our tier-1 platforms when we update to a new Rust stable release. E.g. Rustbuild dropping -fPIC on i686-linux (bug 1336155, https://github.com/rust-lang/rust/pull/39523) and armv7-linux-androideabi requiring neon (bug 1323773)

We should run some kind of period integration test against forthcoming releases: rust nightly builds, and especially each beta release to detect these problems sooner.
Blocks: oxidation
glandium's working on bug 1313111, which will make things like this a lot more tractable.
Linux32 builds fail on rust 1.16.0-beta.1 with the same -fPIC issue 1.15.0 had. This should be fixed in the next beta release.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ab06245bb10c
FWIW rust 1.16.0-beta.2 is  out now which should have the -fPIC issue fixed
Thanks Alex. Confirmed 1.16.0-beta.2 passes our integration test. https://treeherder.mozilla.org/#/jobs?repo=try&revision=4d7f94486456
Thanks for testing Ralph!
Thanks for testing rillian!
Otoh, we seem to have a problem with 1.17-nightly on MacOS:

> 15:26:55     INFO -  checking rustc version...
> 15:26:55     INFO -  DEBUG: Executing: `/builds/slave/try-m64-0000000000000000000000/build/src/rustc/bin/rustc --version --verbose`
> 15:26:55     INFO -  DEBUG: The command returned non-zero exit status -11.
> 15:26:55     INFO -  ERROR: Command `/builds/slave/try-m64-0000000000000000000000/build/src/rustc/bin/rustc --version --verbose` failed with exit status -11.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8de49f284c13a6d702ef60906752eb76b3505fa4

I see this problem with the March 12 build, and with a March 3 build from last week. Did anything change in the build environment between 1.16 and 1.17 which would prevent the binaries from running on MacOS 10.7? I know you've been moving things from buildbot to docker and travis. Maybe a newer MACOS_DEPLOYMENT_TARGET?
Flags: needinfo?(acrichton)
Ah yes I believe that in migration to the new build system we forgot that env var, here's an issue for it https://github.com/rust-lang/rust/issues/40481

I'll try to have it fixed soon (and we'll need to backport this to the soon-to-be-released 1.17.0 beta).
Flags: needinfo?(acrichton)
Thanks, Alex! I'll keep an eye on it.
Attachment #8846740 - Attachment is obsolete: true
Attachment #8846741 - Attachment is obsolete: true
rust 1.17.0-beta.2 is still failing like #16 on macOS, even though that build should have the fix from rust-lang/rust#40600.

Alex, any further ideas here? I thought I tested 1.17 nightly, but it's not written down here, so maybe not. Perhaps the change to `MACOSX_DEPLOYMENT_TARGET=10.8` (rust-lang/rust#40482) is insufficient? The failing builds are on macOS 10.7. I'll try 1.18-nightly to confirm.
Flags: needinfo?(acrichton)
Oh right I think Gecko *builds* on 10.7, right? As opposed to building on a newer platform and targeting 10.7?

If that's the case then we may be running out of luck unfortunately. LLVM no longer builds when using a newer toolchain (like the one we're using on Travis) with MACOSX_DEPLOYMENT_TARGET=10.7. I haven't looked into what would be necessary to rectify the situation, but can you confirm that you're attempting to run rustc on a 10.7 mac? (that at least I'm pretty sure is likely to fail)
Flags: needinfo?(acrichton)
That's correct - our OSX build machines are running 10.7.
Thanks for confirming, Ryan. Our build machines are running macOS 10.7, but we *target* 10.9 since Firefox 49. So we need a toolchain which can run in the 10.7 build environment.

It may be we can race a gecko requirement for rust >= 1.17.0 with a ci upgrade. IIRC the build machines are on 10.7 because:

 - We needed to maintain a pool of 10.7 machines for the Firefox extended support releases, and there were concerns about splitting the available hardware pool.

 - We couldn't obtain resources to re-image them.

 - We want to transition to linux-hosted cross-compile builds for macOS targets.

The last extended support release targeting macOS 10.6 is Firefox 45.9 esr, scheduled for April 18th. It may be after that we can upgrade the build environment. I don't know the status of the taskcluster mac build promotion to tier-1. Amy, could you please comment on relative timelines for the three paths? It's been about a year since bug 1269798, and this will become a blocker for quantum in a few months.
Flags: needinfo?(arich)
>  - We want to transition to linux-hosted cross-compile builds for macOS
> targets.
> 

Currently we are investigating a performance regression using the TaskCluster cross compiled builds.  This investigation will help us decide if we can continue with our use of cross compiled builds in the near future or if we need to continue building on mac hardware.  ted and wcosta are digging into this.  By the end of this week we hope to have some better answers if we are going to be investing more effort into these builds or do we need to plan an interim solution of using mac builds.  I can update this bug once we have some more information (EOW).
Flags: needinfo?(arich)
See Also: → 1338651
Great, thanks.
Confirming that today's rustc 1.17.0-nightly (ccce2c6eb 2017-03-27) fails the same way, consistent with #21. https://treeherder.mozilla.org/#/jobs?repo=try&revision=3ff004ab7603ba623237dfaefa25836251c52006
I will resume investigation into why we compile rustc for 10.8 soon. Hopefully there's some escape hatch to use to avoid blocking on upgrading infrastructure.
Ok I've done some more testing on our end. The state of play is that I've found is that we specifically cannot compile LLVM for the 10.7 target from the current OSX image that we're using. We apparently are certainly not the first (https://github.com/JuliaLang/julia/issues/19762) to have run into this issue either.

This is a regression on our end. We haven't changed LLVM versions in a long time, but we've changed infrastructure. It turns out that the Xcode version we're using generates this error, but *older* Xcode versions do not generate the error. Travis, where we build our releases, has the ability to switch Xcode versions so I tested out a few:

* Xcode 6.4 - CMake was too old to compile LLVM
* Xcode 7 - compiled LLVM successfully
* Xcode 8+ - failed to compile LLVM

So from our end we could fix this regression by switching to compiling the Rust compiler with Xcode 7. Unfortunately though this version comes with lldb 350 which means that we can't run any of our debuginfo/lldb tests. These would otherwise regress fairly often and are sometimes difficult to fix, so that's not something we'd like to do lightly just yet.

So from the rust-lang/rust end we have a few options to fix this regression:

* Switch to xcode 7, turn off our LLDB tests
* Compile releases with xcode 7, leave nightly on xcode 8
* Continue compiling releases with buildbot instead of Travis

To test the waters, what's the sense of urgency with fixing this regression? It sounds like it will specifically prevent Gecko from upgrading rustc version until it is fixed. We could fix it on our end with one of the above strategies (none of which are great unfortunately). If Gecko is soon to be cross-compiled however then this should become a non-issue anyway. 

Does that all make sense? Are there perhaps other opinions about how to best fix this?
I just talked with Brian and we believe we have a solution for this, I will send PRs to rust-lang/rust shortly
I'm hoping that https://github.com/rust-lang/rust/pull/40967 will solve this regression. If that lands I'll backport it to beta and we'll get a new beta out soon.
That change is now being backported to beta as well in https://github.com/rust-lang/rust/issues/40995. That doesn't bump the beta version just yet, but I suspect we will do so soon.
Ok 1.17.0 beta 3 is out, Ralph mind testing it to see if it works?
Looks like 1.17.0-beta.3 restored macOS 10.7 support. Thanks!

https://treeherder.mozilla.org/#/jobs?repo=try&revision=28bd7587a1bfd350b8d13718eb7f1d6057b88738
See Also: → 1321847
rustc 1.18.0-nightly (91ae22a01 2017-04-05) also works on try.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=93d2c11642d6df87cae96f2362fc1d7962dd0780

Last week we had an issue with a unit test failure on 1.18 opt builds; we were checking for a null fn() passed over ffi, which the rust compiler now optimizes away. This was discussed in rust-lang/rust#40913 and bug 1351497. Lang-team consensus was that this was intended behaviour, and :kinetik fixed our code and binding generator to use `Option<fn>` with the nullptr-optimization instead. Thanks also to :kinetik for explaining to me why we want fn() to be non-null even in unsafe rust. :) Those fixes have landed in gecko now.
See Also: → 1354994
(In reply to Greg Arndt [:garndt] from comment #24)
> >  - We want to transition to linux-hosted cross-compile builds for macOS
> > targets.
> > 
> 
> Currently we are investigating a performance regression using the
> TaskCluster cross compiled builds.  This investigation will help us decide
> if we can continue with our use of cross compiled builds in the near future
> or if we need to continue building on mac hardware.  ted and wcosta are
> digging into this.  By the end of this week we hope to have some better
> answers if we are going to be investing more effort into these builds or do
> we need to plan an interim solution of using mac builds.  I can update this
> bug once we have some more information (EOW).

We have suspended efforts of trying to improve the timings with the cross compiled build to stand up the buildbot builds and have them scheduled by taskcluster.  Once that's complete, we will spend some time trying to investigate the regressions because in the long term we want to use these builds.  I do not have an ETA of when that will be resolved.  We've investigated most things that we have thought of so far without any luck.  If you have any ideas, you can contact wcosta in #taskcluster.
Thanks for the update Greg.

This week's rust nightly still green. https://treeherder.mozilla.org/#/jobs?repo=try&revision=febfddd1ebbb021bddd05e6437cdc313cf965556
1.18.0-beta.1 looks like it's ready to go. https://treeherder.mozilla.org/#/jobs?repo=try&revision=8f4f72d66442
Blocks: 1365300
1.18.0-beta.2 fails building stylo's gecko bindings. I've filed rust-lang/rust#42042 for the ICE and bug 1365300 for stylo tracking and possible work-arounds. https://treeherder.mozilla.org/#/jobs?repo=try&revision=86ebd4aa3836&selectedJob=99289427
rustc 1.19.0-nightly (75b056812 2017-05-15) hits rust-lang/rust#41620 in the stylo code (new warning deprecating float literals in match patterns) but it looks like Simon is aware of the issue, so I think this will be resolved without action on the gecko side (either 1.19 will drop the warning or servo will change their code).
Depends on: 1367932
Depends on: 1367934
Depends on: 1376010
(In reply to Mike Hommey [:glandium] from comment #46)
> With 1.20.0-beta.1, we get a llvm-dsymutil crash:
> https://public-artifacts.taskcluster.net/DPiAQgT5SNqfF2tlmkG5Lw/0/public/
> logs/live_backing.log

This /could/ be bug 1381043
(In reply to Mike Hommey [:glandium] from comment #47)
> (In reply to Mike Hommey [:glandium] from comment #46)
> > With 1.20.0-beta.1, we get a llvm-dsymutil crash:
> > https://public-artifacts.taskcluster.net/DPiAQgT5SNqfF2tlmkG5Lw/0/public/
> > logs/live_backing.log
> 
> This /could/ be bug 1381043

It is.
Depends on: 1386414
Product: Core → Firefox Build System
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: