Closed Bug 1527134 Opened 5 years ago Closed 5 years ago

Intermittent [taskcluster:error] exit status 1073807364

Categories

(Infrastructure & Operations :: RelOps: OpenCloudConfig, task, P5)

x86_64
Windows 10

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: intermittent-bug-filer, Assigned: grenade)

References

Details

(Keywords: intermittent-failure)

#[markdown(off)]
Filed by: btara [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=227748563&repo=mozilla-release

https://queue.taskcluster.net/v1/task/Mv77YdcwQEeBu6gOHqlHnA/runs/0/artifacts/public/logs/live_backing.log

The failure summary is the same as bug 1485628 but the log looks different.

21:59:59 INFO - TEST-START | dom/tests/mochitest/fetch/test_fetch_observer.html
22:00:00 INFO - GECKO(192) | ++DOMWINDOW == 14 (0000024D93CA9C00) [pid = 11184] [serial = 64] [outer = 0000024D932B8400]
22:00:00 INFO - GECKO(192) | ++DOCSHELL 0000024D8B941800 == 4 [pid = 11184] [id = {f5b3d361-98eb-4a90-a6ec-2d48b8313f70}]
22:00:00 INFO - GECKO(192) | ++DOMWINDOW == 15 (0000024D93CA5800) [pid = 11184] [serial = 65] [outer = 0000000000000000]
22:00:00 INFO - GECKO(192) | ++DOMWINDOW == 16 (0000024D95329800) [pid = 11184] [serial = 66] [outer = 0000024D93CA5800]
22:00:00 INFO - GECKO(192) | JavaScript error: http://mochi.test:8888/tests/dom/tests/mochitest/fetch/file_fetch_observer.html, line 24: AbortError: The operation was aborted.
[taskcluster 2019-02-11T22:00:00.846Z] Exit Code: 1073807364
[taskcluster 2019-02-11T22:00:00.846Z] User Time: 15.625ms
[taskcluster 2019-02-11T22:00:00.846Z] Kernel Time: 0s
[taskcluster 2019-02-11T22:00:00.846Z] Wall Time: 30m49.6471419s
[taskcluster 2019-02-11T22:00:00.846Z] Result: FAILED
[taskcluster 2019-02-11T22:00:00.846Z] === Task Finished ===
[taskcluster 2019-02-11T22:00:00.846Z] Task Duration: 30m49.6481212s
[taskcluster 2019-02-11T22:00:01.323Z] Uploading artifact public/logs/localconfig.json from file logs\localconfig.json with content encoding "gzip", mime type "application/octet-stream" and expiry 2020-02-11T20:29:50.088Z
[taskcluster 2019-02-11T22:00:01.940Z] Uploading artifact public/logs/log_critical.log from file logs\log_critical.log with content encoding "gzip", mime type "text/plain" and expiry 2020-02-11T20:29:50.088Z
[taskcluster 2019-02-11T22:00:02.280Z] Uploading artifact public/logs/log_error.log from file logs\log_error.log with content encoding "gzip", mime type "text/plain" and expiry 2020-02-11T20:29:50.088Z
[taskcluster 2019-02-11T22:00:02.630Z] Uploading artifact public/logs/log_fatal.log from file logs\log_fatal.log with content encoding "gzip", mime type "text/plain" and expiry 2020-02-11T20:29:50.088Z
[taskcluster 2019-02-11T22:00:03.018Z] Uploading artifact public/logs/log_info.log from file logs\log_info.log with content encoding "gzip", mime type "text/plain" and expiry 2020-02-11T20:29:50.088Z
[taskcluster 2019-02-11T22:00:04.036Z] Uploading artifact public/logs/log_raw.log from file logs\log_raw.log with content encoding "gzip", mime type "text/plain" and expiry 2020-02-11T20:29:50.088Z
[taskcluster 2019-02-11T22:00:04.677Z] Uploading artifact public/logs/log_warning.log from file logs\log_warning.log with content encoding "gzip", mime type "text/plain" and expiry 2020-02-11T20:29:50.088Z
[taskcluster 2019-02-11T22:00:05.009Z] Uploading artifact public/test_info/manifests.list from file build\blobber_upload_dir\manifests.list with content encoding "gzip", mime type "application/octet-stream" and expiry 2020-02-11T20:29:50.088Z
[taskcluster 2019-02-11T22:00:05.422Z] Uploading artifact public/test_info/plain-chunked_errorsummary.log from file build\blobber_upload_dir\plain-chunked_errorsummary.log with content encoding "gzip", mime type "text/plain" and expiry 2020-02-11T20:29:50.088Z
[taskcluster 2019-02-11T22:00:05.794Z] Uploading artifact public/test_info/plain-chunked_raw.log from file build\blobber_upload_dir\plain-chunked_raw.log with content encoding "gzip", mime type "text/plain" and expiry 2020-02-11T20:29:50.088Z
[taskcluster 2019-02-11T22:00:06.402Z] Uploading artifact public/test_info/system-info.log from file build\blobber_upload_dir\system-info.log with content encoding "gzip", mime type "text/plain" and expiry 2020-02-11T20:29:50.088Z
[taskcluster 2019-02-11T22:00:06.993Z] Uploading redirect artifact public/logs/live.log to URL https://queue.taskcluster.net/v1/task/Mv77YdcwQEeBu6gOHqlHnA/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2020-02-11T20:29:50.088Z
[taskcluster:error] exit status 1073807364

 22:00:00 INFO - GECKO(192) | JavaScript error: http://mochi.test:8888/tests/dom/tests/mochitest/fetch/file_fetch_observer.html, line 24: AbortError: The operation was aborted.

looks like a test failed -- not a worker issue.

Flags: needinfo?(btara)

:dustin
There are more failures like this one (same exit status code) on other trees too.

eg: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=227749381&repo=autoland&lineNumber=9075

Flags: needinfo?(btara)

Pete, some google suggests this is Windows-ese for a host shutting down. Is this spot termination, by chance?

Flags: needinfo?(pmoore)

(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #4)

Pete, some google suggests this is Windows-ese for a host shutting down. Is this spot termination, by chance?

it is indeed a system reboot and looks to be triggered by windows update (which is supposed to be disabled. we've seen this before, bug 1485628)

i am investigating now...

Assignee: nobody → rthijssen
Status: NEW → ASSIGNED
Component: General → RelOps: OpenCloudConfig
Flags: needinfo?(pmoore)
OS: Unspecified → Windows 10
Product: Taskcluster → Infrastructure & Operations
QA Contact: rthijssen
Hardware: Unspecified → x86_64
See Also: → 1485628

argh! there was some configuration missing from the gecko-t-win10-64 manifest which handles the setting of the registry keys that disable updates on windows 10.

i've reinstated the missing config and redeployed the worker type. it will be 40 minutes or so before the rebooting machines are purged and replaced with better behaving ones.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED

just a note that the patch succeeded. last occurrence of "1073807364" in the event logs is at 2019-02-12 06:14:13 UTC (5 hours ago)

https://papertrailapp.com/groups/2488493/events?q=1073807364

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

these failures are most likely triggered by our upgrade to generic worker 13 (see bug 1524592, comment 19).
during this particular upgrade, we've had to manually terminate workers running older gw. we don't normally need to do this and i don't foresee circumstances that would call for this again, but on this occasion, the upgrade path calls for manual terminations. under normal circumstances, workers terminate themselves after completing any tasks in progress which prevents errors like those mentioned here. apologies for the inconvenience. aborted tasks should complete normally when retriggered.

Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
See Also: → 1544403
You need to log in before you can comment on or make changes to this bug.