Perma [tier2] OS X Cross Compiled addon bustages timed out after 2400 seconds of no output when Gecko 76 merges to Beta on 2020-04-06
Categories
(Firefox Build System :: General, defect)
Tracking
(firefox-esr68 unaffected, firefox74 unaffected, firefox75 unaffected, firefox76+ verified)
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox74 | --- | unaffected |
firefox75 | --- | unaffected |
firefox76 | + | verified |
People
(Reporter: dvarga, Assigned: away)
References
(Regression)
Details
(Keywords: regression)
Attachments
(1 file)
Central as beta sim: https://treeherder.mozilla.org/#/jobs?repo=try&resultStatus=testfailed%2Cbusted%2Cexception%2Crunnable&revision=e15fa168b66d52a5928e12b1e3621936c5ee8843
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=292846234&repo=try&lineNumber=56289
[task 2020-03-12T14:45:10.765Z] 14:45:10 INFO - make[4]: Leaving directory '/builds/worker/workspace/obj-build/toolkit/library/rust'
[task 2020-03-12T15:25:10.787Z] 15:25:10 INFO - Automation Error: mozprocess timed out after 2400 seconds running ['/builds/worker/checkouts/gecko/obj-x86_64-pc-linux-gnu/_virtualenvs/init/bin/python', 'mach', '--log-no-times', 'build', '-v']
[task 2020-03-12T15:25:10.822Z] 15:25:10 ERROR - timed out after 2400 seconds of no output
[task 2020-03-12T15:25:10.822Z] 15:25:10 ERROR - Return code: -15
[task 2020-03-12T15:25:10.822Z] 15:25:10 WARNING - setting return code to 2
[task 2020-03-12T15:25:10.822Z] 15:25:10 FATAL - 'mach build -v' did not run successfully. Please check log for errors.
[task 2020-03-12T15:25:10.822Z] 15:25:10 FATAL - Running post_fatal callback...
[task 2020-03-12T15:25:10.822Z] 15:25:10 FATAL - Exiting -1
[task 2020-03-12T15:25:10.822Z] 15:25:10 INFO - [mozharness: 2020-03-12 15:25:10.822924Z] Finished build step (failed)
[task 2020-03-12T15:25:10.822Z] 15:25:10 INFO - Running post-run listener: _parse_build_tests_ccov
[task 2020-03-12T15:25:10.823Z] 15:25:10 INFO - Running command: ['/builds/worker/checkouts/gecko/obj-x86_64-pc-linux-gnu/_virtualenvs/init/bin/python', 'mach', 'python', 'testing/parse_build_tests_ccov.py'] in /builds/worker/checkouts/gecko
[task 2020-03-12T15:25:10.823Z] 15:25:10 INFO - Copy/paste: /builds/worker/checkouts/gecko/obj-x86_64-pc-linux-gnu/_virtualenvs/init/bin/python mach python testing/parse_build_tests_ccov.py
[task 2020-03-12T15:25:10.825Z] 15:25:10 INFO - Using env: {'ACCEPTED_MAR_CHANNEL_IDS': 'firefox-mozilla-beta,firefox-mozilla-release',
Reporter | ||
Comment 1•5 years ago
|
||
Mike, could this be culprit for this bustages: https://hg.mozilla.org/mozilla-central/rev/cad040db602cd5e194780280fd0fe3d338a48638?
Comment 2•5 years ago
|
||
It doesn't look like it's related to me, though it's hard to tell since the build appears to hang and just timeout from no output. Ricky recently investigated a Windows hang recently that had similar behavior, so maybe he has an idea on how to figure out what's going on here (or maybe bug 1622109 can help). While it's not likely that it is the same underlying issue since this is an OSX build, I think the same debugging techniques will help.
Comment 3•5 years ago
|
||
Pushlog from the last good rev till first bad one: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=884162af76f5225bbf4efe486959d2fa9757bc56&tochange=ffd615bf92ddb28a01b881d14126fc139ebf7880
Looking through the log it all goes well till here https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=292846234&repo=try&lineNumber=29505 then there are a lot of warnings with | warning: trait objects without an explicit | warning: use of deprecated item...| after running cargo stuff.
Adam, could the changes in https://bugzilla.mozilla.org/show_bug.cgi?id=1515451#c9 have something to do with these timeouts?
Try push with the backout: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception%2Crunning%2Cpending%2Crunnable&revision=c08c5fdbce2654169a3d58b4184bebac9407b274&searchStr=os%2Cx%2Ccross%2Ccompiled%2Caddon%2Copt%2Cbuild-macosx64-add-on-devel%2Fopt%2C%28b%29&selectedJob=293231539
Comment 4•5 years ago
|
||
I don't think it's related to bug 1515451. The try seems to still be failing with those backed out.
Comment hidden (Intermittent Failures Robot) |
Comment 6•5 years ago
•
|
||
-
sim with https://hg.mozilla.org/mozilla-central/rev/5e32bdf73dc213e9944205411baa0d51df681140 as parent, failure does not occur: https://treeherder.mozilla.org/#/jobs?repo=try&revision=dc3326901c4de78548f5676d8925ae7ec21fe71c
-
sim with https://hg.mozilla.org/mozilla-central/rev/4fd5c458be4c3bc2d1f22bd575667104a5d173fe as parent, first runs fail with bug 1411358, then with bug 1622496: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e62cd772eadbed4b44d21e065b09489d9af736bc&selectedJob=293306981
Updated•5 years ago
|
Comment 8•5 years ago
|
||
Bisection shows it's a regression from bug 1619461.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 9•5 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #8)
Bisection shows it's a regression from bug 1619461.
Thanks!
Assignee | ||
Comment 10•5 years ago
|
||
Could someone give me a step by step of how to reproduce this on try using a m-c base of my choice?
Comment 11•5 years ago
|
||
Based on https://wiki.mozilla.org/Sheriffing/How_To/Beta_simulations#TRUNK_AS_EARLY_BETA
hg update -r revofyourchoice
./mach try release -v 76.0b1 --tasks release-sim --migration central-to-beta --no-push
hg commit -m "Early beta config"
./mach try chooser --full
In the chooser, pick "opt" and "macOS add-on devel" or how it is called.
Updated•5 years ago
|
Assignee | ||
Comment 12•5 years ago
|
||
./mach try release -v 76.0b1 --tasks release-sim --migration central-to-beta --no-push
Nice, that's very easy, thanks! I can reproduce the issue and am investigating.
Assignee | ||
Comment 13•5 years ago
|
||
I tracked this down to a compiler bug which I filed upstream as https://bugs.llvm.org/show_bug.cgi?id=45253.
In the meantime we'll need to work around this. There are various sizes of hammer we could use...
- Disable the flag for individual compilations on mac. I'm listing this for completeness, but I don't think we should do this. IIUC the problematic struct is in a header and the set of affected TUs can change.
- --disable-new-pass-manager for the OSX addon-devel build.
- Disable for all mac targets in configure.
I lean towards option 2, knowing that it's potentially whac-a-mole if this pops up in other build flavors, but I really don't want to undo the perf gains in shippable builds.
glandium, do you have any preference, or other ideas?
Comment 14•5 years ago
|
||
Any idea why it only happens on those builds and not others? Could it be an early-beta-or-earlier thing that saves the beta, and it would fail even further down the release pipeline?
https://wiki.mozilla.org/Sheriffing/How_To/Beta_simulations#TRUNK_AS_LATE_BETA
Comment 15•5 years ago
|
||
It also fails for late beta builds.
Assignee | ||
Comment 16•5 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #14)
Any idea why it only happens on those builds and not others?
I don't know. I assume that the pervasive code additions from debug, ccov, asan, and profiling make enough changes to avoid triggering the problem. I can tell you that it's not specific to addon-devel. If I trigger normal opt builds with a beta simulation, they timeout too. The next question might be why beta opt builds fail and trunk opt builds succeed. I diffed the preprocessed sources and the main difference was DMD, however --disable-dmd on trunk still didn't make us timeout, so it's not that in isolation. At this point I don't have any energy left for investigating further builds, we should figure out how to move forward.
Assignee | ||
Comment 17•5 years ago
|
||
Updated•5 years ago
|
Comment 18•5 years ago
|
||
Reporter | ||
Comment 19•5 years ago
|
||
bugherder |
Comment 20•5 years ago
|
||
Verified fixed in latest beta sim: https://treeherder.mozilla.org/#/jobs?repo=try&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunnable&revision=520188a0b71f5d5856d3b780e3774748bbd0fe56&searchStr=osx
Comment 21•5 years ago
•
|
||
dmajor, does the immprovement below makes sense for this patch?
== Change summary for alert #25520 (as of Tue, 31 Mar 2020 07:16:38 GMT) ==
Improvements:
20% build times osx-cross debug taskcluster-c5d.4xlarge 1,828.03 -> 1,463.57
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=25520
Assignee | ||
Comment 22•5 years ago
|
||
Yes, this makes sense. If I set the graph to 30 days, this appears to counter the regression in alert #25331.
Description
•