Provide better error handling for xpcshell tests when the Android emulator becomes unresponsive

RESOLVED FIXED in Firefox 52

Status

()

P2
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: gbrown, Assigned: gbrown)

Tracking

unspecified
Firefox 54
Points:
---
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox52 fixed, firefox53 fixed, firefox54 fixed)

Details

Attachments

(1 attachment)

(Assignee)

Description

2 years ago
Sometimes the Android emulator becomes unresponsive during a job. For most test jobs, that triggers an automatic retry, but for xpcshell tests, that is usually reported as a test failure, which causes confusion and wastes developer time.
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1333712
(Assignee)

Updated

2 years ago
Priority: -- → P2
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1296299
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1303495
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1304589
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1308366
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1308665
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1309871
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1311086
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1315129
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1315418
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1317710
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1319089
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1321671
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1321163
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1328265
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1325249
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1324969
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1324291
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1321983
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1320979
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1320206
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1311577
Duplicate of this bug: 1324236
(Assignee)

Comment 24

2 years ago
Some examples:

[task 2016-11-29T23:58:17.338984Z] 23:58:17     INFO -  TEST-START | netwerk/test/unit/test_progress.js
[task 2016-11-30T00:03:17.339826Z] 00:03:17  WARNING -  TEST-UNEXPECTED-TIMEOUT | netwerk/test/unit/test_progress.js | Test timed out
[task 2016-11-30T00:03:17.339990Z] 00:03:17     INFO -  TEST-INFO took 300000ms
[task 2016-11-30T00:03:27.379045Z] 00:03:27     INFO -  mozdevice Timeout exceeded for shell call 'adb shell /data/local/xpcb/xpcw /storage/sdcard/tests/xpc/netwerk/test/unit -r /storage/sdcard/tests/xpc/c/httpd.manifest --greomni /data/local/xpcb/target.apk -m -s -e 'const _HEAD_JS_PATH = "/storage/sdcard/tests/xpc/head.js";' -e 'const _MOZINFO_JS_PATH = "/storage/sdcard/tests/xpc/p/mozinfo.json";' -e 'const _TESTING_MODULES_DIR = "/storage/sdcard/tests/xpc/m";' -f /storage/sdcard/tests/xpc/head.js -e 'const _SERVER_ADDR = "localhost"' -e 'const _HEAD_FILES = ["/storage/sdcard/tests/xpc/netwerk/test/unit/head_channels.js", "/storage/sdcard/tests/xpc/netwerk/test/unit/head_cache.js", "/storage/sdcard/tests/xpc/netwerk/test/unit/head_cache2.js"];' -e 'const _TAIL_FILES = [];' -e 'const _JSDEBUGGER_PORT = 0;' -e 'const _TEST_FILE = ["test_progress.js"];' -e 'const _TEST_NAME = "netwerk/test/unit/test_progress.js"' -e '_execute_test(); quit(0);'; echo $?'
[task 2016-11-30T00:03:47.379151Z] 00:03:47     INFO -  mozdevice Timeout exceeded for _runCmd call 'adb shell ps'
[task 2016-11-30T00:03:47.382686Z] 00:03:47     INFO -  xpcshell return code: 93
[task 2016-11-30T00:03:57.425446Z] 00:03:57     INFO -  mozdevice Timeout exceeded for _runCmd call 'adb shell ps'
[task 2016-11-30T00:04:27.468833Z] 00:04:27     INFO -  mozdevice Timeout exceeded for _runCmd call 'adb shell ps'
[task 2016-11-30T00:04:27.473445Z] 00:04:27  WARNING -  TEST-UNEXPECTED-FAIL | Received SIGINT (control-C), so stopped run. (Use --keep-going to keep running tests after killing one with SIGINT)



05:40:49     INFO -  TEST-START | netwerk/test/unit/test_bug767025.js
05:45:49  WARNING -  TEST-UNEXPECTED-TIMEOUT | netwerk/test/unit/test_bug767025.js | Test timed out
05:45:49     INFO -  TEST-INFO took 300096ms
05:45:59     INFO -  mozdevice Timeout exceeded for shell call 'adb shell /data/local/xpcb/xpcw /storage/sdcard/tests/xpc/netwerk/test/unit -r /storage/sdcard/tests/xpc/c/httpd.manifest --greomni /data/local/xpcb/fennec-53.0a1.en-US.android-arm.apk -m -s -e 'const _HEAD_JS_PATH = "/storage/sdcard/tests/xpc/head.js";' -e 'const _MOZINFO_JS_PATH = "/storage/sdcard/tests/xpc/p/mozinfo.json";' -e 'const _TESTING_MODULES_DIR = "/storage/sdcard/tests/xpc/m";' -f /storage/sdcard/tests/xpc/head.js -e 'const _SERVER_ADDR = "localhost"' -e 'const _HEAD_FILES = ["/storage/sdcard/tests/xpc/netwerk/test/unit/head_channels.js", "/storage/sdcard/tests/xpc/netwerk/test/unit/head_cache.js", "/storage/sdcard/tests/xpc/netwerk/test/unit/head_cache2.js"];' -e 'const _TAIL_FILES = [];' -e 'const _JSDEBUGGER_PORT = 0;' -e 'const _TEST_FILE = ["test_bug767025.js"];' -e 'const _TEST_NAME = "netwerk/test/unit/test_bug767025.js"' -e '_execute_test(); quit(0);'; echo $?'
05:46:20     INFO -  mozdevice Timeout exceeded for _runCmd call 'adb shell ps'
05:46:20     INFO -  xpcshell return code: 1483450857825
05:46:30     INFO -  mozdevice Timeout exceeded for _runCmd call 'adb shell ps'
05:47:00     INFO -  mozdevice Timeout exceeded for _runCmd call 'adb shell ps'
05:47:00  WARNING -  TEST-UNEXPECTED-FAIL | Received SIGINT (control-C), so stopped run. (Use --keep-going to keep running tests after killing one with SIGINT)




22:27:19     INFO -  TEST-START | netwerk/test/unit/test_redirect_failure.js
22:32:19  WARNING -  TEST-UNEXPECTED-TIMEOUT | netwerk/test/unit/test_redirect_failure.js | Test timed out
22:32:19     INFO -  TEST-INFO expected FAIL | took 300076ms
22:32:29     INFO -  mozdevice Timeout exceeded for shell call 'adb shell /data/local/xpcb/xpcw /storage/sdcard/tests/xpc/netwerk/test/unit -r /storage/sdcard/tests/xpc/c/httpd.manifest --greomni /data/local/xpcb/fennec-53.0a1.en-US.android-arm.apk -m -s -e 'const _HEAD_JS_PATH = "/storage/sdcard/tests/xpc/head.js";' -e 'const _MOZINFO_JS_PATH = "/storage/sdcard/tests/xpc/p/mozinfo.json";' -e 'const _TESTING_MODULES_DIR = "/storage/sdcard/tests/xpc/m";' -f /storage/sdcard/tests/xpc/head.js -e 'const _SERVER_ADDR = "localhost"' -e 'const _HEAD_FILES = ["/storage/sdcard/tests/xpc/netwerk/test/unit/head_channels.js", "/storage/sdcard/tests/xpc/netwerk/test/unit/head_cache.js", "/storage/sdcard/tests/xpc/netwerk/test/unit/head_cache2.js"];' -e 'const _TAIL_FILES = [];' -e 'const _JSDEBUGGER_PORT = 0;' -e 'const _TEST_FILE = ["test_redirect_failure.js"];' -e 'const _TEST_NAME = "netwerk/test/unit/test_redirect_failure.js"' -e '_execute_test(); quit(0);'; echo $?'
22:32:49     INFO -  mozdevice Timeout exceeded for _runCmd call 'adb shell ps'
22:32:49     INFO -  xpcshell return code: 53
22:33:00     INFO -  mozdevice Timeout exceeded for _runCmd call 'adb shell ps'
22:33:30     INFO -  mozdevice Timeout exceeded for _runCmd call 'adb shell ps'
22:33:30  WARNING -  TEST-UNEXPECTED-FAIL | Received SIGINT (control-C), so stopped run. (Use --keep-going to keep running tests after killing one with SIGINT)




19:00:58     INFO -  TEST-START | netwerk/test/unit/test_range_requests.js
19:05:59  WARNING -  TEST-UNEXPECTED-TIMEOUT | netwerk/test/unit/test_range_requests.js | Test timed out
19:05:59     INFO -  TEST-INFO took 300077ms
19:06:09     INFO -  mozdevice Timeout exceeded for shell call 'adb shell /data/local/xpcb/xpcw /storage/sdcard/tests/xpc/netwerk/test/unit -r /storage/sdcard/tests/xpc/c/httpd.manifest --greomni /data/local/xpcb/fennec-53.0a1.en-US.android-arm.apk -m -s -e 'const _HEAD_JS_PATH = "/storage/sdcard/tests/xpc/head.js";' -e 'const _MOZINFO_JS_PATH = "/storage/sdcard/tests/xpc/p/mozinfo.json";' -e 'const _TESTING_MODULES_DIR = "/storage/sdcard/tests/xpc/m";' -f /storage/sdcard/tests/xpc/head.js -e 'const _SERVER_ADDR = "localhost"' -e 'const _HEAD_FILES = ["/storage/sdcard/tests/xpc/netwerk/test/unit/head_channels.js", "/storage/sdcard/tests/xpc/netwerk/test/unit/head_cache.js", "/storage/sdcard/tests/xpc/netwerk/test/unit/head_cache2.js"];' -e 'const _TAIL_FILES = [];' -e 'const _JSDEBUGGER_PORT = 0;' -e 'const _TEST_FILE = ["test_range_requests.js"];' -e 'const _TEST_NAME = "netwerk/test/unit/test_range_requests.js"' -e '_execute_test(); quit(0);'; echo $?'
19:06:29     INFO -  mozdevice Timeout exceeded for _runCmd call 'adb shell ps'
19:06:29     INFO -  xpcshell return code: 1480388468118
19:06:39     INFO -  mozdevice Timeout exceeded for _runCmd call 'adb shell ps'
19:07:09     INFO -  mozdevice Timeout exceeded for _runCmd call 'adb shell ps'
19:07:09  WARNING -  TEST-UNEXPECTED-FAIL | Received SIGINT (control-C), so stopped run. (Use --keep-going to keep running tests after killing one with SIGINT)
(Assignee)

Comment 25

2 years ago
"DMError" in a test log triggers a retry: https://dxr.mozilla.org/mozilla-central/rev/1d025ac534a6333a8170a59a95a8a3673d4028ee/testing/mozharness/mozharness/mozilla/testing/errors.py#101. 

That DMError check works for mochitests and reftests because those tests use remoteautomation.py for process monitoring of the remote browser, and the remoteautomation.py code usually fails on a devicemanagerADB _checkCmd() call, which raises DMError.

In contrast, xpcshell tests don't use remoteautomation.py (having different, arguably simpler process management requirements). The xpcshell wrapper is launched with shell(), and the xpcshell timeout mechanism triggers proc.kill() and proc.poll() calls that ultimately use devicemanager.getProcessList(). DM.getProcessList() call _runCmd('adb shell ps') -- not _checkCmd().

It wouldn't be appropriate to throw DMError in _runCmd: it is intended as a non-throwing alternative to _checkCmd().

It might be OK to change DM.getProcessList() to use _checkCmd() instead of _runCmd(), but I worry that might have a de-stabilizing effect on our various DeviceManager clients.

Perhaps the least dangerous path forward is to retry on the existing message, "Timeout exceeded for _runCmd call 'adb shell ps'", or perhaps "Timeout exceeded for _runCmd call". If an adb command times out, I think that is likely an indication of a test environment problem, rather than a test problem.
(Assignee)

Comment 26

2 years ago
See my reasoning in comment 25.

I hope it will be this simple. It seems to do no harm:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6af77bea86e1bcece4fc7e902d943c807c6eed07
Attachment #8833100 - Flags: review?(jmaher)
(Assignee)

Comment 27

2 years ago
What luck! We encountered the xpcshell-ps-timeout problem and retried successfully: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6af77bea86e1bcece4fc7e902d943c807c6eed07&selectedJob=74071719
Comment on attachment 8833100 [details] [diff] [review]
retry on _runCmd() timeout

Review of attachment 8833100 [details] [diff] [review]:
-----------------------------------------------------------------

oh great!  I am curious why we have the twisted error in the regex, isn't that specific to buildbot?
Attachment #8833100 - Flags: review?(jmaher) → review+
(Assignee)

Comment 29

2 years ago
I think the twisted error is specific to buildbot, and probably no longer needed, but I'm not sure - I won't mess with it, today.

Comment 30

2 years ago
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/06c3be84254c
Trigger test job retry when Android devicemanager _runCmd() times out; r=jmaher
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1335611
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1303890
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1300432

Comment 34

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/06c3be84254c
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox54: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 54
(Assignee)

Updated

2 years ago
Whiteboard: [checkin-needed-aurora][checkin-needed-beta]

Comment 35

2 years ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/49db565f13ba
status-firefox53: --- → fixed
Whiteboard: [checkin-needed-aurora][checkin-needed-beta] → [checkin-needed-beta]

Comment 36

2 years ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/0b69d8923c19
status-firefox52: --- → fixed
Flags: in-testsuite+
Whiteboard: [checkin-needed-beta]
You need to log in before you can comment on or make changes to this bug.