Closed Bug 1561474 Opened 5 years ago Closed 5 years ago

Intermittent bgo/run [taskcluster:error] Aborting task...

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox-esr60 unaffected, firefox-esr68 unaffected, firefox68 unaffected, firefox69 fixed, firefox70 fixed)

RESOLVED FIXED
mozilla70
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox68 --- unaffected
firefox69 --- fixed
firefox70 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: mshal)

References

Details

(Keywords: intermittent-failure, regression, Whiteboard: [stockwell disable-recommended])

Attachments

(2 files)

Filed by: csabou [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=253441116&repo=autoland
Full log: https://queue.taskcluster.net/v1/task/GJNVH_zbSCqZBoEzixtKUw/runs/0/artifacts/public/logs/live_backing.log


[vcs 2019-06-26T00:16:06.941Z] updating [==============================================> ] 271200/279523 49s
[vcs 2019-06-26T00:16:09.324Z] updating [==============================================> ] 271500/279523 47s
[vcs 2019-06-26T00:16:13.330Z] updating [==============================================> ] 271800/279523 46s
[vcs 2019-06-26T00:16:15.466Z] updating [==============================================> ] 271900/279523 48s
[vcs 2019-06-26T00:16:16.525Z] updating [==============================================> ] 272200/279523 46s
[taskcluster:error] Aborting task...
[taskcluster 2019-06-26T00:16:17.503Z] SUCCESS: The process with PID 4076 (child process of PID 1252) has been terminated.
[taskcluster 2019-06-26T00:16:17.503Z] SUCCESS: The process with PID 3896 (child process of PID 912) has been terminated.
[taskcluster 2019-06-26T00:16:17.503Z] SUCCESS: The process with PID 1252 (child process of PID 912) has been terminated.
[taskcluster 2019-06-26T00:16:17.503Z] SUCCESS: The process with PID 912 (child process of PID 2604) has been terminated.
[taskcluster 2019-06-26T00:16:17.503Z]
[taskcluster 2019-06-26T00:16:17.503Z] === Task Finished ===
[taskcluster 2019-06-26T00:16:17.503Z] Task Duration: 20m0.0650428s
[taskcluster:error] Uploading error artifact public/build/profdata.tar.xz from file build/src/artifacts/profdata.tar.xz with message "Could not read file 'Z:\task_1561498049\build\src\artifacts\profdata.tar.xz'", reason "file-missing-on-worker" and expiry 2020-06-24T23:03:07.896Z
[taskcluster:error] TASK FAILURE during artifact upload: file-missing-on-worker: Could not read file 'Z:\task_1561498049\build\src\artifacts\profdata.tar.xz'
[taskcluster:error] Uploading error artifact public/build/profile-run-1.log from file build/src/artifacts/profile-run-1.log with message "Could not read file 'Z:\task_1561498049\build\src\artifacts\profile-run-1.log'", reason "file-missing-on-worker" and expiry 2020-06-24T23:03:07.896Z
[taskcluster:error] TASK FAILURE during artifact upload: file-missing-on-worker: Could not read file 'Z:\task_1561498049\build\src\artifacts\profile-run-1.log'
[taskcluster:error] Uploading error artifact public/build/profile-run-2.log from file build/src/artifacts/profile-run-2.log with message "Could not read file 'Z:\task_1561498049\build\src\artifacts\profile-run-2.log'", reason "file-missing-on-worker" and expiry 2020-06-24T23:03:07.896Z
[taskcluster:error] TASK FAILURE during artifact upload: file-missing-on-worker: Could not read file 'Z:\task_1561498049\build\src\artifacts\profile-run-2.log'
[taskcluster 2019-06-26T00:16:19.011Z] [mounts] Preserving cache: Moving "Z:\task_1561498049\build" to "Z:\caches\ER9TlNX1T1GKloovEhAybQ"
[taskcluster 2019-06-26T00:16:19.622Z] [mounts] Denying task_1561498049 access to 'Z:\caches\ER9TlNX1T1GKloovEhAybQ'
[taskcluster 2019-06-26T00:17:00.481Z] Uploading redirect artifact public/logs/live.log to URL https://queue.taskcluster.net/v1/task/GJNVH_zbSCqZBoEzixtKUw/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2020-06-24T23:03:07.896Z
[taskcluster:error] Task aborted - max run time exceeded
[taskcluster:error] file-missing-on-worker: Could not read file 'Z:\task_1561498049\build\src\artifacts\profdata.tar.xz'
[taskcluster:error] file-missing-on-worker: Could not read file 'Z:\task_1561498049\build\src\artifacts\profile-run-1.log'
[taskcluster:error] file-missing-on-worker: Could not read file 'Z:\task_1561498049\build\src\artifacts\profile-run-2.log'

Looks like it took the whole task runtime just trying to clone the tree. If this happens more frequently, we can try just bumping the timeout to ignore the issue, or look into why cloning is slow. :sheehan, any ideas what might've happened here?

Flags: needinfo?(sheehan)

(In reply to Michael Shal [:mshal] from comment #1)

Looks like it took the whole task runtime just trying to clone the tree. If this happens more frequently, we can try just bumping the timeout to ignore the issue, or look into why cloning is slow. :sheehan, any ideas what might've happened here?

On a first glace there are a few things I notice that seem wrong here:

Given the second point, I think this is an issue with filesystem performance. I believe EC2 has these kinds of issues when a new instance is bootstrapped from an AMI (the first read/write is extremely slow IIRC). The first point in the above list may be contributing as well.

Flags: needinfo?(sheehan)
Depends on: 1563785

This bug failed 46 times in the last 7 days. On windows2012-64-shippable and windows2012-32-shippable on opt build types.

Recent log:
https://treeherder.mozilla.org/logviewer.html#?job_id=255041763&repo=autoland

Assignee: nobody → mshal

This was originally from bug 1528374 for Mac PGO, but that isn't able to
land yet and it should help Windows PGO runs in the meantime.

Pushed by mshal@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d501a16ad338
Define and use a sparse profile for the profile-run task; r=sheehan
https://hg.mozilla.org/integration/autoland/rev/155d964c8be4
Use sparse-profile on Windows generate tasks; r=firefox-build-system-reviewers,chmanchester
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70

Looks like this is hitting Beta also. Please request approval if you're comfortable doing so.

Flags: needinfo?(mshal)

Comment on attachment 9077560 [details]
Bug 1561474 - Use sparse-profile on Windows generate tasks; r?#firefox-build-system-reviewers

Beta/Release Uplift Approval Request

  • User impact if declined: Impacts sheriffing the beta tree, since Windows 'run' tasks will intermittently fail without these patches. Does not impact the final shipped product.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This change modifies how many files are checked out by hg before the profile generation step runs. If there are any issues importing the patch, it should be an obvious failure when the 'run' tasks is scheduled.
  • String changes made/needed: none
Flags: needinfo?(mshal)
Attachment #9077560 - Flags: approval-mozilla-beta?
Attachment #9077559 - Flags: approval-mozilla-beta?

Comment on attachment 9077559 [details]
Bug 1561474 - Define and use a sparse profile for the profile-run task; r?sheehan

Fixes some intermittent build failures on Windows. Doesn't affect the resulting builds, just how much we checkout at the start of the build. Approved for 69.0b6.

Attachment #9077559 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9077560 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: