Permafail Windows coverage build running out of time
Categories
(Testing :: Code Coverage, defect, P1)
Tracking
(firefox-esr60 unaffected, firefox66 unaffected, firefox67 unaffected, firefox68 fixed)
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox66 | --- | unaffected |
firefox67 | --- | unaffected |
firefox68 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: ahal)
References
(Regression)
Details
(Keywords: intermittent-failure, regression)
Attachments
(1 file)
#[markdown(off)]
Filed by: btara [at] mozilla.com
https://treeherder.mozilla.org/logviewer.html#?job_id=238345090&repo=mozilla-central
11:04:40 INFO - package-tests> mozmake.EXE[2]: Leaving directory 'z:/build/build/src/obj-firefox'
[taskcluster 2019-04-05T11:10:27.663Z] [taskcluster-proxy] Successfully refreshed taskcluster-proxy credentials: task-client/LSSyyg_HSzOnM9bVve7K-w/0/on/us-west-1/i-0f6ccd032e967af2c/until/1554463828.376
[taskcluster 2019-04-05T11:27:29.123Z] [taskcluster-proxy] Successfully refreshed taskcluster-proxy credentials: task-client/LSSyyg_HSzOnM9bVve7K-w/0/on/us-west-1/i-0f6ccd032e967af2c/until/1554464849.637
[taskcluster 2019-04-05T11:44:30.293Z] [taskcluster-proxy] Successfully refreshed taskcluster-proxy credentials: task-client/LSSyyg_HSzOnM9bVve7K-w/0/on/us-west-1/i-0f6ccd032e967af2c/until/1554465870.715
[taskcluster 2019-04-05T12:01:31.456Z] [taskcluster-proxy] Successfully refreshed taskcluster-proxy credentials: task-client/LSSyyg_HSzOnM9bVve7K-w/0/on/us-west-1/i-0f6ccd032e967af2c/until/1554466891.74
[taskcluster:error] Aborting task...
[taskcluster 2019-04-05T12:02:23.898Z] ERROR: The process "1316" not found.
[taskcluster 2019-04-05T12:02:23.898Z]
[taskcluster:warn 2019-04-05T12:02:23.898Z] exit status 128
[taskcluster 2019-04-05T12:02:23.935Z] ERROR: The process "3064" not found.
[taskcluster 2019-04-05T12:02:23.935Z]
[taskcluster:warn 2019-04-05T12:02:23.935Z] exit status 128
[taskcluster 2019-04-05T12:02:23.969Z] ERROR: The process "1436" not found.
[taskcluster 2019-04-05T12:02:23.969Z]
[taskcluster:warn 2019-04-05T12:02:23.969Z] exit status 128
[taskcluster 2019-04-05T12:02:24.003Z] ERROR: The process "3956" not found.
[taskcluster 2019-04-05T12:02:24.003Z]
[taskcluster:warn 2019-04-05T12:02:24.003Z] exit status 128
[taskcluster 2019-04-05T12:02:24.038Z] ERROR: The process "4028" not found.
[taskcluster 2019-04-05T12:02:24.038Z]
[taskcluster:warn 2019-04-05T12:02:24.038Z] exit status 128
[taskcluster 2019-04-05T12:02:24.072Z] ERROR: The process "3396" not found.
[taskcluster 2019-04-05T12:02:24.072Z]
[taskcluster:warn 2019-04-05T12:02:24.072Z] exit status 128
[taskcluster 2019-04-05T12:02:24.107Z] ERROR: The process "2508" not found.
[taskcluster 2019-04-05T12:02:24.107Z]
[taskcluster:warn 2019-04-05T12:02:24.107Z] exit status 128
[taskcluster 2019-04-05T12:02:24.146Z] SUCCESS: The process with PID 3484 (child process of PID 4228) has been terminated.
[taskcluster 2019-04-05T12:02:24.146Z] SUCCESS: The process with PID 4228 (child process of PID 3368) has been terminated.
[taskcluster 2019-04-05T12:02:24.146Z] SUCCESS: The process with PID 3368 (child process of PID 680) has been terminated.
[taskcluster 2019-04-05T12:02:24.146Z] SUCCESS: The process with PID 680 (child process of PID 4964) has been terminated.
[taskcluster 2019-04-05T12:02:24.146Z] SUCCESS: The process with PID 4964 (child process of PID 1500) has been terminated.
[taskcluster 2019-04-05T12:02:24.146Z] SUCCESS: The process with PID 1500 (child process of PID 4568) has been terminated.
[taskcluster 2019-04-05T12:02:24.146Z] SUCCESS: The process with PID 4568 (child process of PID 3824) has been terminated.
[taskcluster 2019-04-05T12:02:24.146Z] SUCCESS: The process with PID 3824 (child process of PID 4280) has been terminated.
[taskcluster 2019-04-05T12:02:24.146Z] SUCCESS: The process with PID 4280 (child process of PID 732) has been terminated.
[taskcluster 2019-04-05T12:02:24.146Z] SUCCESS: The process with PID 732 (child process of PID 3776) has been terminated.
[taskcluster 2019-04-05T12:02:24.146Z] SUCCESS: The process with PID 3776 (child process of PID 2176) has been terminated.
[taskcluster 2019-04-05T12:02:24.146Z] SUCCESS: The process with PID 2176 (child process of PID 3404) has been terminated.
[taskcluster 2019-04-05T12:02:24.146Z] SUCCESS: The process with PID 3404 (child process of PID 700) has been terminated.
[taskcluster 2019-04-05T12:02:24.146Z] SUCCESS: The process with PID 3540 (child process of PID 3600) has been terminated.
[taskcluster 2019-04-05T12:02:24.146Z] SUCCESS: The process with PID 700 (child process of PID 3600) has been terminated.
[taskcluster 2019-04-05T12:02:24.146Z] SUCCESS: The process with PID 3600 (child process of PID 2484) has been terminated.
[taskcluster 2019-04-05T12:02:24.146Z]
[taskcluster 2019-04-05T12:02:24.146Z] === Task Finished ===
[taskcluster 2019-04-05T12:02:24.146Z] Task Duration: 2h0m1.4955177s
[taskcluster:error] Uploading error artifact public/code-coverage-grcov.zip from file build\src\obj-firefox\code-coverage-grcov.zip with message "Could not read file 'Z:\\task_1554458424\\build\\src\\obj-firefox\\code-coverage-grcov.zip'", reason "file-missing-on-worker" and expiry 2020-04-04T09:52:14.758Z
[taskcluster:error] TASK FAILURE during artifact upload: file-missing-on-worker: Could not read file 'Z:\task_1554458424\build\src\obj-firefox\code-coverage-grcov.zip'
[taskcluster 2019-04-05T12:02:25.542Z] Uploading artifact public/logs/localconfig.json from file logs\localconfig.json with content encoding "gzip", mime type "application/octet-stream" and expiry 2020-04-04T09:52:14.758Z
[taskcluster 2019-04-05T12:02:26.183Z] Uploading artifact public/build/target.awsy.tests.tar.gz from file public\build\target.awsy.tests.tar.gz with content encoding "", mime type "application/x-gzip" and expiry 2020-04-04T09:52:14.758Z
[taskcluster 2019-04-05T12:02:26.640Z] Uploading artifact public/build/target.common.tests.tar.gz from file public\build\target.common.tests.tar.gz with content encoding "", mime type "application/x-gzip" and expiry 2020-04-04T09:52:14.758Z
[taskcluster 2019-04-05T12:02:27.580Z] Uploading artifact public/build/target.cppunittest.tests.tar.gz from file public\build\target.cppunittest.tests.tar.gz with content encoding "", mime type "application/x-gzip" and expiry 2020-04-04T09:52:14.758Z
[taskcluster 2019-04-05T12:02:28.616Z] Uploading artifact public/build/target.gtest.tests.tar.gz from file public\build\target.gtest.tests.tar.gz with content encoding "", mime type "application/x-gzip" and expiry 2020-04-04T09:52:14.758Z
[taskcluster 2019-04-05T12:02:34.317Z] Uploading artifact public/build/target.mochitest.tests.tar.gz from file public\build\target.mochitest.tests.tar.gz with content encoding "", mime type "application/x-gzip" and expiry 2020-04-04T09:52:14.758Z
[taskcluster 2019-04-05T12:02:35.813Z] Uploading artifact public/build/target.raptor.tests.tar.gz from file public\build\target.raptor.tests.tar.gz with content encoding "", mime type "application/x-gzip" and expiry 2020-04-04T09:52:14.758Z
[taskcluster 2019-04-05T12:02:36.528Z] Uploading artifact public/build/target.reftest.tests.tar.gz from file public\build\target.reftest.tests.tar.gz with content encoding "", mime type "application/x-gzip" and expiry 2020-04-04T09:52:14.758Z
[taskcluster 2019-04-05T12:02:38.543Z] Uploading artifact public/build/target.talos.tests.tar.gz from file public\build\target.talos.tests.tar.gz with content encoding "", mime type "application/x-gzip" and expiry 2020-04-04T09:52:14.758Z
[taskcluster 2019-04-05T12:02:39.298Z] Uploading artifact public/build/target.updater-dep.tests.tar.gz from file public\build\target.updater-dep.tests.tar.gz with content encoding "", mime type "application/x-gzip" and expiry 2020-04-04T09:52:14.758Z
[taskcluster 2019-04-05T12:02:39.683Z] Uploading artifact public/build/target.web-platform.tests.tar.gz from file public\build\target.web-platform.tests.tar.gz with content encoding "", mime type "application/x-gzip" and expiry 2020-04-04T09:52:14.758Z
[taskcluster 2019-04-05T12:02:41.130Z] Uploading artifact public/build/target.xpcshell.tests.tar.gz from file public\build\target.xpcshell.tests.tar.gz with content encoding "", mime type "application/x-gzip" and expiry 2020-04-04T09:52:14.758Z
[taskcluster 2019-04-05T12:02:41.820Z] Uploading artifact public/logs/certified.log from file generic-worker\certified.log with content encoding "gzip", mime type "text/plain; charset=utf-8" and expiry 2020-04-04T09:52:14.758Z
[taskcluster 2019-04-05T12:02:43.612Z] Uploading artifact public/chain-of-trust.json from file generic-worker\chain-of-trust.json with content encoding "gzip", mime type "text/plain; charset=utf-8" and expiry 2020-04-04T09:52:14.758Z
[taskcluster 2019-04-05T12:02:43.985Z] Uploading artifact public/chain-of-trust.json.sig from file generic-worker\chain-of-trust.json.sig with content encoding "gzip", mime type "application/octet-stream" and expiry 2020-04-04T09:52:14.758Z
[taskcluster 2019-04-05T12:02:44.372Z] Uploading artifact public/chainOfTrust.json.asc from file generic-worker\chainOfTrust.json.asc with content encoding "gzip", mime type "text/plain; charset=utf-8" and expiry 2020-04-04T09:52:14.758Z
[taskcluster 2019-04-05T12:02:44.943Z] Uploading redirect artifact public/logs/live.log to URL https://queue.taskcluster.net/v1/task/LSSyyg_HSzOnM9bVve7K-w/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2020-04-04T09:52:14.758Z
[taskcluster:error] Task aborted - max run time exceeded
[taskcluster:error] file-missing-on-worker: Could not read file 'Z:\task_1554458424\build\src\obj-firefox\code-coverage-grcov.zip'
Updated•5 years ago
|
Comment 1•5 years ago
|
||
Comment 2•5 years ago
|
||
This is a permafailure. Since this push: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&searchStr=ccov&revision=a765634b9ec26b122c094ad5def8e70ea5c23e17.
The runtime before that push was ~80 minutes, now it breaks the 120 minutes barrier.
Comment 3•5 years ago
|
||
I've triggered ccov builds for some of the autoland pushes from that push, and https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=c28c538fca0a&searchStr=ccov is the first failing one.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 5•5 years ago
|
||
Looks like the relevant part of the log is near the end:
14:05:49 INFO - package-tests> Wrote 72285 files in 54373432 bytes to target.web-platform.tests.tar.gz in 39.02s
14:05:49 INFO - package-tests> mozmake.EXE[2]: Leaving directory 'z:/build/build/src/obj-firefox'
[taskcluster 2019-04-07T14:18:02.388Z] [taskcluster-proxy] Successfully refreshed taskcluster-proxy credentials: task-client/fHyuCZfgQpe6BkY5crxuJg/0/on/us-west-2/i-02a455cadb66f0fc1/until/1554647882.465
[taskcluster 2019-04-07T14:35:03.170Z] [taskcluster-proxy] Successfully refreshed taskcluster-proxy credentials: task-client/fHyuCZfgQpe6BkY5crxuJg/0/on/us-west-2/i-02a455cadb66f0fc1/until/1554648903.22
[taskcluster 2019-04-07T14:52:03.886Z] [taskcluster-proxy] Successfully refreshed taskcluster-proxy credentials: task-client/fHyuCZfgQpe6BkY5crxuJg/0/on/us-west-2/i-02a455cadb66f0fc1/until/1554649923.912
[taskcluster 2019-04-07T15:09:04.598Z] [taskcluster-proxy] Successfully refreshed taskcluster-proxy credentials: task-client/fHyuCZfgQpe6BkY5crxuJg/0/on/us-west-2/i-02a455cadb66f0fc1/until/1554650944.584
[taskcluster:error] Aborting task...
[taskcluster 2019-04-07T15:10:01.465Z] ERROR: The process "2556" not found.
[taskcluster 2019-04-07T15:10:01.465Z]
...
[taskcluster 2019-04-07T15:10:01.723Z] === Task Finished ===
Pay close attention to the timestamps. So the build finishes packaging web-platform-tests and then there are four Successfully refreshed taskcluster-proxy credentials
log messages, and a 15 minute gap between each one. After an hour we see Aborting task...
. It looks like that taskcluster refresh log happens periodically throughout the build process, so I assume there is an actual hang happening here (and not just a case of making the builds slower).
Marco, there are two high level changes that the regressing bug did:
- Update the copies of mozprocess, mozinfo, mozfile and manifestparser that mozharness uses
- Call the builds via
mach python
instead ofpython2.7
(which could potentially cause different versions of packages to be used)
Marco, do either of those sound particularly suspicious?
Assignee | ||
Comment 6•5 years ago
|
||
I checked a couple failing builds and they all seem to hang after packaging web-platform tests. In a normal passing build, the next couple log lines should have been:
14:23:16 INFO - package-tests> Wrote 72285 files in 54373432 bytes to target.web-platform.tests.tar.gz in 59.50s
14:23:16 INFO - package-tests> mozmake.EXE[2]: Leaving directory 'z:/build/build/src/obj-firefox'
14:26:27 INFO - package> mozmake.EXE[5]: Entering directory 'z:/build/build/src/obj-firefox/browser/installer'
14:26:27 INFO - package> NO_PKG_FILES="core bsdecho js js-config jscpucfg nsinstall viewer TestGtkEmbed elf-dynstr-gc mangle* maptsv* mfc* msdump* msmap* nm2tsv* nsinstall* res/samples res/throbber shlibsign* certutil* pk12util* BadCertServer* OCSPStaplingServer* SymantecSanctionsServer* GenerateOCSPResponse* chrome/chrome.rdf chrome/app-chrome.manifest chrome/overlayinfo components/compreg.dat components/xpti.dat content_unit_tests necko_unit_tests *.dSYM SmokeDMD"
14:26:27 INFO - package> z:/build/build/src/obj-firefox/_virtualenvs/init/Scripts/python.exe z:/build/build/src/toolkit/mozapps/installer/packager.py
Comment 7•5 years ago
|
||
Initially I was thinking of https://searchfox.org/mozilla-central/rev/6db0a6a56653355fcbb25c4fa79c6e7ffc6f88e9/testing/mozharness/mozharness/mozilla/building/buildbase.py#1679 (which is using 'mach python'), but it doesn't seem to fit.
Assignee | ||
Comment 8•5 years ago
|
||
The Wrote .. to target.web-platform.tests
line comes from here:
https://searchfox.org/mozilla-central/source/python/mozbuild/mozbuild/action/test_archive.py#820
Which gets called from here:
https://searchfox.org/mozilla-central/source/testing/testsuite-targets.mk#158
Reading makefiles is not my forte though. I'm also confused why ccov builds would be different from normal ones here.
Comment 9•5 years ago
|
||
I tried to disable Rust tests, which is one of the differences between normal builds and ccov builds, but it didn't make a difference: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5de35c528c768271487823284b9d23d1d32ce46e.
Assignee | ||
Comment 10•5 years ago
|
||
I ran a few try runs:
https://treeherder.mozilla.org/#/jobs?repo=try&author=ahalberstadt%40mozilla.com&fromchange=87f98216ee979df1af96e598e7322130d16499e2&tochange=c4c688b2ec2723fd4a269823eade5bf4edfd5c8f
The bottom is baseline (aka current central), then I backed out the two patches one by one. In otherwords, this is the specific commit that caused the regression:
https://hg.mozilla.org/mozilla-central/rev/50495a5da9f6
Since this is Windows only, it's likely this hunk specifically:
https://hg.mozilla.org/mozilla-central/diff/50495a5da9f6/taskcluster/taskgraph/transforms/job/mozharness.py
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 11•5 years ago
|
||
Have it working on try using PYTHONPATH rather than mach python
:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5def23c2c7f27d9a4831db0daf292cf03487032e
I filed bug 1543149 to figure out why using mach python
breaks the ccov build. I think we'll need someone on the build team to investigate as I wasn't getting anywhere.
Assignee | ||
Comment 12•5 years ago
|
||
This is a hack to get around Windows ccov build hangs caused by bug 1195299.
Bug 1543149 will track the investigation of the hang and removal of this hack.
Comment 13•5 years ago
|
||
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/20c86313e332 [taskgraph] Hack required mozbase module onto PYTHONPATH rather than run 'mach python' for Windows builds, r=marco
Comment 14•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Updated•2 years ago
|
Description
•