Closed Bug 1622168 Opened 5 years ago Closed 5 years ago

Perma [tier2] OS X Cross Compiled addon bustages timed out after 2400 seconds of no output when Gecko 76 merges to Beta on 2020-04-06

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox-esr68 unaffected, firefox74 unaffected, firefox75 unaffected, firefox76+ verified)

VERIFIED FIXED
mozilla76
Tracking Status
firefox-esr68 --- unaffected
firefox74 --- unaffected
firefox75 --- unaffected
firefox76 + verified

People

(Reporter: dvarga, Assigned: away)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

Central as beta sim: https://treeherder.mozilla.org/#/jobs?repo=try&resultStatus=testfailed%2Cbusted%2Cexception%2Crunnable&revision=e15fa168b66d52a5928e12b1e3621936c5ee8843

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=292846234&repo=try&lineNumber=56289

[task 2020-03-12T14:45:10.765Z] 14:45:10     INFO -  make[4]: Leaving directory '/builds/worker/workspace/obj-build/toolkit/library/rust'
[task 2020-03-12T15:25:10.787Z] 15:25:10     INFO - Automation Error: mozprocess timed out after 2400 seconds running ['/builds/worker/checkouts/gecko/obj-x86_64-pc-linux-gnu/_virtualenvs/init/bin/python', 'mach', '--log-no-times', 'build', '-v']
[task 2020-03-12T15:25:10.822Z] 15:25:10    ERROR - timed out after 2400 seconds of no output
[task 2020-03-12T15:25:10.822Z] 15:25:10    ERROR - Return code: -15
[task 2020-03-12T15:25:10.822Z] 15:25:10  WARNING - setting return code to 2
[task 2020-03-12T15:25:10.822Z] 15:25:10    FATAL - 'mach build -v' did not run successfully. Please check log for errors.
[task 2020-03-12T15:25:10.822Z] 15:25:10    FATAL - Running post_fatal callback...
[task 2020-03-12T15:25:10.822Z] 15:25:10    FATAL - Exiting -1
[task 2020-03-12T15:25:10.822Z] 15:25:10     INFO - [mozharness: 2020-03-12 15:25:10.822924Z] Finished build step (failed)
[task 2020-03-12T15:25:10.822Z] 15:25:10     INFO - Running post-run listener: _parse_build_tests_ccov
[task 2020-03-12T15:25:10.823Z] 15:25:10     INFO - Running command: ['/builds/worker/checkouts/gecko/obj-x86_64-pc-linux-gnu/_virtualenvs/init/bin/python', 'mach', 'python', 'testing/parse_build_tests_ccov.py'] in /builds/worker/checkouts/gecko
[task 2020-03-12T15:25:10.823Z] 15:25:10     INFO - Copy/paste: /builds/worker/checkouts/gecko/obj-x86_64-pc-linux-gnu/_virtualenvs/init/bin/python mach python testing/parse_build_tests_ccov.py
[task 2020-03-12T15:25:10.825Z] 15:25:10     INFO - Using env: {'ACCEPTED_MAR_CHANNEL_IDS': 'firefox-mozilla-beta,firefox-mozilla-release',
Flags: needinfo?(mshal)
Regressions: 1620744
Summary: Perma [tier2] timed out after 2400 seconds of no output | Automation Error: mozprocess timed out after 2400 seconds running ['/builds/worker/checkouts/gecko/obj-x86_64-pc-linux-gnu/_virtualenvs/init/bin/python', 'mach'when Gecko 76 m → Perma [tier2] OS X Cross Compiled addon bustages timed out after 2400 seconds of no output when Gecko 76 merges to Beta on 2020-04-06

It doesn't look like it's related to me, though it's hard to tell since the build appears to hang and just timeout from no output. Ricky recently investigated a Windows hang recently that had similar behavior, so maybe he has an idea on how to figure out what's going on here (or maybe bug 1622109 can help). While it's not likely that it is the same underlying issue since this is an OSX build, I think the same debugging techniques will help.

Flags: needinfo?(mshal) → needinfo?(rstewart)
See Also: → 1622496

I don't think it's related to bug 1515451. The try seems to still be failing with those backed out.

Flags: needinfo?(agashlin)
Flags: needinfo?(mh+mozilla)

Zibi, any chance this is from bug 1560038 ?

Flags: needinfo?(gandalf)

Bisection shows it's a regression from bug 1619461.

Flags: needinfo?(rstewart)
Flags: needinfo?(gandalf)
Flags: needinfo?(dmajor)
Regressed by: 1619461
No longer regressions: 1620744
Has Regression Range: --- → yes

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #8)

Bisection shows it's a regression from bug 1619461.

Thanks!

Could someone give me a step by step of how to reproduce this on try using a m-c base of my choice?

Based on https://wiki.mozilla.org/Sheriffing/How_To/Beta_simulations#TRUNK_AS_EARLY_BETA

hg update -r revofyourchoice
./mach try release -v 76.0b1 --tasks release-sim --migration central-to-beta --no-push
hg commit -m "Early beta config"
./mach try chooser --full

In the chooser, pick "opt" and "macOS add-on devel" or how it is called.

./mach try release -v 76.0b1 --tasks release-sim --migration central-to-beta --no-push

Nice, that's very easy, thanks! I can reproduce the issue and am investigating.

I tracked this down to a compiler bug which I filed upstream as https://bugs.llvm.org/show_bug.cgi?id=45253.

In the meantime we'll need to work around this. There are various sizes of hammer we could use...

  1. Disable the flag for individual compilations on mac. I'm listing this for completeness, but I don't think we should do this. IIUC the problematic struct is in a header and the set of affected TUs can change.
  2. --disable-new-pass-manager for the OSX addon-devel build.
  3. Disable for all mac targets in configure.

I lean towards option 2, knowing that it's potentially whac-a-mole if this pops up in other build flavors, but I really don't want to undo the perf gains in shippable builds.

glandium, do you have any preference, or other ideas?

Flags: needinfo?(dmajor) → needinfo?(mh+mozilla)

Any idea why it only happens on those builds and not others? Could it be an early-beta-or-earlier thing that saves the beta, and it would fail even further down the release pipeline?
https://wiki.mozilla.org/Sheriffing/How_To/Beta_simulations#TRUNK_AS_LATE_BETA

Flags: needinfo?(mh+mozilla)

(In reply to Mike Hommey [:glandium] from comment #14)

Any idea why it only happens on those builds and not others?

I don't know. I assume that the pervasive code additions from debug, ccov, asan, and profiling make enough changes to avoid triggering the problem. I can tell you that it's not specific to addon-devel. If I trigger normal opt builds with a beta simulation, they timeout too. The next question might be why beta opt builds fail and trunk opt builds succeed. I diffed the preprocessed sources and the main difference was DMD, however --disable-dmd on trunk still didn't make us timeout, so it's not that in isolation. At this point I don't have any energy left for investigating further builds, we should figure out how to move forward.

Assignee: nobody → dmajor
Status: NEW → ASSIGNED
Pushed by dmajor@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/236b90d0f663 Disable new pass manager on non-pgo mac builds due to hangs r=glandium
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76

dmajor, does the immprovement below makes sense for this patch?
== Change summary for alert #25520 (as of Tue, 31 Mar 2020 07:16:38 GMT) ==

Improvements:

20% build times osx-cross debug taskcluster-c5d.4xlarge 1,828.03 -> 1,463.57

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=25520

Flags: needinfo?(dmajor)

Yes, this makes sense. If I set the graph to 30 days, this appears to counter the regression in alert #25331.

Flags: needinfo?(dmajor)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: