Closed Bug 1335501 Opened 8 years ago Closed 8 years ago

Provide better error handling for xpcshell tests when the Android emulator becomes unresponsive

Categories

(Firefox for Android Graveyard :: Testing, defect, P2)

defect

Tracking

(firefox52 fixed, firefox53 fixed, firefox54 fixed)

RESOLVED FIXED
Firefox 54
Tracking Status
firefox52 --- fixed
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

Attachments

(1 file)

Sometimes the Android emulator becomes unresponsive during a job. For most test jobs, that triggers an automatic retry, but for xpcshell tests, that is usually reported as a test failure, which causes confusion and wastes developer time.
Priority: -- → P2
Some examples: [task 2016-11-29T23:58:17.338984Z] 23:58:17 INFO - TEST-START | netwerk/test/unit/test_progress.js [task 2016-11-30T00:03:17.339826Z] 00:03:17 WARNING - TEST-UNEXPECTED-TIMEOUT | netwerk/test/unit/test_progress.js | Test timed out [task 2016-11-30T00:03:17.339990Z] 00:03:17 INFO - TEST-INFO took 300000ms [task 2016-11-30T00:03:27.379045Z] 00:03:27 INFO - mozdevice Timeout exceeded for shell call 'adb shell /data/local/xpcb/xpcw /storage/sdcard/tests/xpc/netwerk/test/unit -r /storage/sdcard/tests/xpc/c/httpd.manifest --greomni /data/local/xpcb/target.apk -m -s -e 'const _HEAD_JS_PATH = "/storage/sdcard/tests/xpc/head.js";' -e 'const _MOZINFO_JS_PATH = "/storage/sdcard/tests/xpc/p/mozinfo.json";' -e 'const _TESTING_MODULES_DIR = "/storage/sdcard/tests/xpc/m";' -f /storage/sdcard/tests/xpc/head.js -e 'const _SERVER_ADDR = "localhost"' -e 'const _HEAD_FILES = ["/storage/sdcard/tests/xpc/netwerk/test/unit/head_channels.js", "/storage/sdcard/tests/xpc/netwerk/test/unit/head_cache.js", "/storage/sdcard/tests/xpc/netwerk/test/unit/head_cache2.js"];' -e 'const _TAIL_FILES = [];' -e 'const _JSDEBUGGER_PORT = 0;' -e 'const _TEST_FILE = ["test_progress.js"];' -e 'const _TEST_NAME = "netwerk/test/unit/test_progress.js"' -e '_execute_test(); quit(0);'; echo $?' [task 2016-11-30T00:03:47.379151Z] 00:03:47 INFO - mozdevice Timeout exceeded for _runCmd call 'adb shell ps' [task 2016-11-30T00:03:47.382686Z] 00:03:47 INFO - xpcshell return code: 93 [task 2016-11-30T00:03:57.425446Z] 00:03:57 INFO - mozdevice Timeout exceeded for _runCmd call 'adb shell ps' [task 2016-11-30T00:04:27.468833Z] 00:04:27 INFO - mozdevice Timeout exceeded for _runCmd call 'adb shell ps' [task 2016-11-30T00:04:27.473445Z] 00:04:27 WARNING - TEST-UNEXPECTED-FAIL | Received SIGINT (control-C), so stopped run. (Use --keep-going to keep running tests after killing one with SIGINT) 05:40:49 INFO - TEST-START | netwerk/test/unit/test_bug767025.js 05:45:49 WARNING - TEST-UNEXPECTED-TIMEOUT | netwerk/test/unit/test_bug767025.js | Test timed out 05:45:49 INFO - TEST-INFO took 300096ms 05:45:59 INFO - mozdevice Timeout exceeded for shell call 'adb shell /data/local/xpcb/xpcw /storage/sdcard/tests/xpc/netwerk/test/unit -r /storage/sdcard/tests/xpc/c/httpd.manifest --greomni /data/local/xpcb/fennec-53.0a1.en-US.android-arm.apk -m -s -e 'const _HEAD_JS_PATH = "/storage/sdcard/tests/xpc/head.js";' -e 'const _MOZINFO_JS_PATH = "/storage/sdcard/tests/xpc/p/mozinfo.json";' -e 'const _TESTING_MODULES_DIR = "/storage/sdcard/tests/xpc/m";' -f /storage/sdcard/tests/xpc/head.js -e 'const _SERVER_ADDR = "localhost"' -e 'const _HEAD_FILES = ["/storage/sdcard/tests/xpc/netwerk/test/unit/head_channels.js", "/storage/sdcard/tests/xpc/netwerk/test/unit/head_cache.js", "/storage/sdcard/tests/xpc/netwerk/test/unit/head_cache2.js"];' -e 'const _TAIL_FILES = [];' -e 'const _JSDEBUGGER_PORT = 0;' -e 'const _TEST_FILE = ["test_bug767025.js"];' -e 'const _TEST_NAME = "netwerk/test/unit/test_bug767025.js"' -e '_execute_test(); quit(0);'; echo $?' 05:46:20 INFO - mozdevice Timeout exceeded for _runCmd call 'adb shell ps' 05:46:20 INFO - xpcshell return code: 1483450857825 05:46:30 INFO - mozdevice Timeout exceeded for _runCmd call 'adb shell ps' 05:47:00 INFO - mozdevice Timeout exceeded for _runCmd call 'adb shell ps' 05:47:00 WARNING - TEST-UNEXPECTED-FAIL | Received SIGINT (control-C), so stopped run. (Use --keep-going to keep running tests after killing one with SIGINT) 22:27:19 INFO - TEST-START | netwerk/test/unit/test_redirect_failure.js 22:32:19 WARNING - TEST-UNEXPECTED-TIMEOUT | netwerk/test/unit/test_redirect_failure.js | Test timed out 22:32:19 INFO - TEST-INFO expected FAIL | took 300076ms 22:32:29 INFO - mozdevice Timeout exceeded for shell call 'adb shell /data/local/xpcb/xpcw /storage/sdcard/tests/xpc/netwerk/test/unit -r /storage/sdcard/tests/xpc/c/httpd.manifest --greomni /data/local/xpcb/fennec-53.0a1.en-US.android-arm.apk -m -s -e 'const _HEAD_JS_PATH = "/storage/sdcard/tests/xpc/head.js";' -e 'const _MOZINFO_JS_PATH = "/storage/sdcard/tests/xpc/p/mozinfo.json";' -e 'const _TESTING_MODULES_DIR = "/storage/sdcard/tests/xpc/m";' -f /storage/sdcard/tests/xpc/head.js -e 'const _SERVER_ADDR = "localhost"' -e 'const _HEAD_FILES = ["/storage/sdcard/tests/xpc/netwerk/test/unit/head_channels.js", "/storage/sdcard/tests/xpc/netwerk/test/unit/head_cache.js", "/storage/sdcard/tests/xpc/netwerk/test/unit/head_cache2.js"];' -e 'const _TAIL_FILES = [];' -e 'const _JSDEBUGGER_PORT = 0;' -e 'const _TEST_FILE = ["test_redirect_failure.js"];' -e 'const _TEST_NAME = "netwerk/test/unit/test_redirect_failure.js"' -e '_execute_test(); quit(0);'; echo $?' 22:32:49 INFO - mozdevice Timeout exceeded for _runCmd call 'adb shell ps' 22:32:49 INFO - xpcshell return code: 53 22:33:00 INFO - mozdevice Timeout exceeded for _runCmd call 'adb shell ps' 22:33:30 INFO - mozdevice Timeout exceeded for _runCmd call 'adb shell ps' 22:33:30 WARNING - TEST-UNEXPECTED-FAIL | Received SIGINT (control-C), so stopped run. (Use --keep-going to keep running tests after killing one with SIGINT) 19:00:58 INFO - TEST-START | netwerk/test/unit/test_range_requests.js 19:05:59 WARNING - TEST-UNEXPECTED-TIMEOUT | netwerk/test/unit/test_range_requests.js | Test timed out 19:05:59 INFO - TEST-INFO took 300077ms 19:06:09 INFO - mozdevice Timeout exceeded for shell call 'adb shell /data/local/xpcb/xpcw /storage/sdcard/tests/xpc/netwerk/test/unit -r /storage/sdcard/tests/xpc/c/httpd.manifest --greomni /data/local/xpcb/fennec-53.0a1.en-US.android-arm.apk -m -s -e 'const _HEAD_JS_PATH = "/storage/sdcard/tests/xpc/head.js";' -e 'const _MOZINFO_JS_PATH = "/storage/sdcard/tests/xpc/p/mozinfo.json";' -e 'const _TESTING_MODULES_DIR = "/storage/sdcard/tests/xpc/m";' -f /storage/sdcard/tests/xpc/head.js -e 'const _SERVER_ADDR = "localhost"' -e 'const _HEAD_FILES = ["/storage/sdcard/tests/xpc/netwerk/test/unit/head_channels.js", "/storage/sdcard/tests/xpc/netwerk/test/unit/head_cache.js", "/storage/sdcard/tests/xpc/netwerk/test/unit/head_cache2.js"];' -e 'const _TAIL_FILES = [];' -e 'const _JSDEBUGGER_PORT = 0;' -e 'const _TEST_FILE = ["test_range_requests.js"];' -e 'const _TEST_NAME = "netwerk/test/unit/test_range_requests.js"' -e '_execute_test(); quit(0);'; echo $?' 19:06:29 INFO - mozdevice Timeout exceeded for _runCmd call 'adb shell ps' 19:06:29 INFO - xpcshell return code: 1480388468118 19:06:39 INFO - mozdevice Timeout exceeded for _runCmd call 'adb shell ps' 19:07:09 INFO - mozdevice Timeout exceeded for _runCmd call 'adb shell ps' 19:07:09 WARNING - TEST-UNEXPECTED-FAIL | Received SIGINT (control-C), so stopped run. (Use --keep-going to keep running tests after killing one with SIGINT)
"DMError" in a test log triggers a retry: https://dxr.mozilla.org/mozilla-central/rev/1d025ac534a6333a8170a59a95a8a3673d4028ee/testing/mozharness/mozharness/mozilla/testing/errors.py#101. That DMError check works for mochitests and reftests because those tests use remoteautomation.py for process monitoring of the remote browser, and the remoteautomation.py code usually fails on a devicemanagerADB _checkCmd() call, which raises DMError. In contrast, xpcshell tests don't use remoteautomation.py (having different, arguably simpler process management requirements). The xpcshell wrapper is launched with shell(), and the xpcshell timeout mechanism triggers proc.kill() and proc.poll() calls that ultimately use devicemanager.getProcessList(). DM.getProcessList() call _runCmd('adb shell ps') -- not _checkCmd(). It wouldn't be appropriate to throw DMError in _runCmd: it is intended as a non-throwing alternative to _checkCmd(). It might be OK to change DM.getProcessList() to use _checkCmd() instead of _runCmd(), but I worry that might have a de-stabilizing effect on our various DeviceManager clients. Perhaps the least dangerous path forward is to retry on the existing message, "Timeout exceeded for _runCmd call 'adb shell ps'", or perhaps "Timeout exceeded for _runCmd call". If an adb command times out, I think that is likely an indication of a test environment problem, rather than a test problem.
See my reasoning in comment 25. I hope it will be this simple. It seems to do no harm: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6af77bea86e1bcece4fc7e902d943c807c6eed07
Attachment #8833100 - Flags: review?(jmaher)
What luck! We encountered the xpcshell-ps-timeout problem and retried successfully: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6af77bea86e1bcece4fc7e902d943c807c6eed07&selectedJob=74071719
Comment on attachment 8833100 [details] [diff] [review] retry on _runCmd() timeout Review of attachment 8833100 [details] [diff] [review]: ----------------------------------------------------------------- oh great! I am curious why we have the twisted error in the regex, isn't that specific to buildbot?
Attachment #8833100 - Flags: review?(jmaher) → review+
I think the twisted error is specific to buildbot, and probably no longer needed, but I'm not sure - I won't mess with it, today.
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/06c3be84254c Trigger test job retry when Android devicemanager _runCmd() times out; r=jmaher
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 54
Whiteboard: [checkin-needed-aurora][checkin-needed-beta]
Whiteboard: [checkin-needed-aurora][checkin-needed-beta] → [checkin-needed-beta]
Flags: in-testsuite+
Whiteboard: [checkin-needed-beta]
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: