Closed Bug 1525288 Opened 5 years ago Closed 5 years ago

[jittest] Better handle intermittent adb errors

Categories

(Core :: JavaScript Engine, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox67 --- fixed

People

(Reporter: bc, Assigned: bc)

References

Details

Attachments

(1 file)

While I believe bug 1524352 will help improve the situations such as in bug 1518650 where the expected error test cases were polluting the suggested bugs in Treeherder, it has failed to eliminate the problem with the error: closed [1] and error: device <serial> not found [2] adb errors.

[1] https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=226113784&repo=autoland&lineNumber=8165
[2] https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=225927129&repo=autoland&lineNumber=9928

One question I have related to ignoring test, is how do we ensure that we do not ignore the entire test suite by accident?

Priority: -- → P3
Component: JavaScript Engine: JIT → JavaScript Engine

(In reply to Nicolas B. Pierron [:nbp] from comment #1)

One question I have related to ignoring test, is how do we ensure that we do not ignore the entire test suite by accident?

The error: closed and error: device <serial> not found errors do not appear to affect subsequent tests and so long as we localize our changes to the specific test affected, the other tests in the suite would not be affected.

After several attempts I do not think my initial approach of flagging the intermittent error in the [1] except ADBProcessError as e: block [1] in run_test_remote, then handling it as one of the special cases in the [2] if rc != test.expect_status: block in check_output is workable.

Instead, I am now attempting to treat the affected test as if it were skipped since there does not appear to be any means of reliably determining the test's pass/fail status once the adb communication error occurred for that individual test. "Skipping" it seems the most reasonable fall back.

This is insufficient to deal with all of the device error related failures however. For example, we have several failures per day of pushing the libraries and tests to the device. Since the error messages for these failures tend to be unique, it is difficult for the sheriffs to classify them. I am going to experiment with a patch to make these errors more easily identifiable. I've adjusted the bug summary to match.

As an example where these errors are causing problems with triaging jittest failures, see [3] where an expected ReferenceError failed due to a zero return code but was misclassified as an example of bug 1518628 error: closed.

nbp: Would you be a good person to review the patches?

[1] https://searchfox.org/mozilla-central/source/js/src/tests/lib/jittests.py#436
[2] https://searchfox.org/mozilla-central/source/js/src/tests/lib/jittests.py#507
[3] https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&resultStatus=testfailed%2Cbusted%2Cexception%2Crunnable&tier=1%2C2%2C3&searchStr=android-hw%2Cjit&selectedJob=226725792

Summary: [jittest] Ignore intermittent adb communication errors in check_output → [jittest] Better handle intermittent adb errors

(In reply to Bob Clary [:bc:] from comment #2)

nbp: Would you be a good person to review the patches?

sfink or my-self are good persons for reviewing jit-tests harness.

Try run with --rebuild 20
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7f9a3adfb81dcf4815d131e808c369015511b663

We see one device failure at bitbar before the test begins running:
TEST-UNEXPECTED-FAIL | bitbar | ADBDevice.init: ls could not be found attempting to clean up device

One device failure attempting to set up the device for the test:
TEST-UNEXPECTED-FAIL | jit_test.py : Device initialization failed

One device failure in mozharness to connect to the device:
ADBError: ADBDevice.init: ls could not be found

And 5 intermittent adb connection errors out of the 400 test runs:

https://taskcluster-artifacts.net/fWfXgF-0TkmY2wMn5MCtkQ/0/public/logs/live_backing.log

Skipping /builds/worker/workspace/build/tests/jit-test/jit-test/tests/basic/testDivModWithIntMin.js due to ignorable adb error error: device 'HT83K1A02572' not found

https://taskcluster-artifacts.net/ZieBqwweT--n6k2qNeWYfw/0/public/logs/live_backing.log

Skipping /builds/worker/workspace/build/tests/jit-test/jit-test/tests/ion/bug1264948.js due to ignorable adb error error: device 'FA84C1A00154' not found

https://taskcluster-artifacts.net/XfTRkvo4QcS1l1oJxRJHgQ/0/public/logs/live_backing.log

Skipping /builds/worker/workspace/build/tests/jit-test/jit-test/tests/structured-clone/Map-Set-cross-compartment.js due to ignorable adb error error: device 'HT83K1A02597' not found

https://taskcluster-artifacts.net/JMBdYdt9QDu8Ouh4aaZ0nw/0/public/logs/live_backing.log

Skipping /builds/worker/workspace/build/tests/jit-test/jit-test/tests/basic/bug820124-1.js due to ignorable adb error error: device 'FA83V1A02389' not found

https://taskcluster-artifacts.net/BlITfWFCTwmpOLyCsKbW5w/0/public/logs/live_backing.log
Skipping /builds/worker/workspace/build/tests/jit-test/jit-test/tests/ion/bug1365769-2.js due to ignorable adb error error: device 'FA84C1A00167' not found

These all have log messages of the form:

TEST-PASS | tests/jit-test/jit-test/tests/ion/bug1365769-2.js | Success (code 59, args "")

showing the test was skipped. The logs show that the remaining tests continued to run. It may be helpful for you in the future to include the number of skipped tests in the job details.

We didn't see any examples of adb error: closed nor of an uncaught ADB error during the test runs.

Attachment #9042957 - Flags: review?(nicolas.b.pierron)
Comment on attachment 9042957 [details] [diff] [review]
bug-1525288-jittest-intermittent-adb-errors.patch

Review of attachment 9042957 [details] [diff] [review]:
-----------------------------------------------------------------

This sounds good to me, but I am not an expert in ADB/mozdevice bindings.
Feel free to get some mozdevice peer feedback if you feel that this might be needed.
Attachment #9042957 - Flags: review?(nicolas.b.pierron) → review+
Pushed by bclary@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2563d7cfc1d2
[jittest] Better handle intermittent adb errors, r=nbp.
See Also: → 1517646
See Also: → 1518719
See Also: → 1524844
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: