Some android-em test tasks fail to retry on ADBTimeoutError
Categories
(Firefox for Android Graveyard :: Testing, defect, P2)
Tracking
(Not tracked)
People
(Reporter: gbrown, Assigned: gbrown)
References
(Blocks 1 open bug)
Details
Attachments
(1 file, 1 obsolete file)
5.80 KB,
patch
|
bc
:
review+
|
Details | Diff | Splinter Review |
Noticed these examples in bug 1500266. https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=214822075&repo=mozilla-central&lineNumber=987 https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=214583874&repo=autoland&lineNumber=1310 https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=215592086&repo=mozilla-central&lineNumber=1003
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
More examples: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=216733447&repo=autoland&lineNumber=2411 https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=216678105&repo=autoland&lineNumber=1642 https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=216663234&repo=mozilla-inbound&lineNumber=2764 https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=216336407&repo=mozilla-central&lineNumber=982 https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=216336405&repo=mozilla-central&lineNumber=972
Assignee | ||
Comment 2•5 years ago
|
||
The most common failures are in is_boot_completed and install_apk; in both cases, these would retry if the exception was handled. https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=bff993881a4c8dd9c67244db53d21936d62f3ad6
Assignee | ||
Comment 3•5 years ago
|
||
Additional cases: - per-test runs, where it appears the logger has matched for retry, but the exit code is over-ridden - task timeouts
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/52fa52aeb6ee Handle more ADBTimeoutError edge cases in mozharness; r=me,a=test-only
Comment 5•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/52fa52aeb6ee
Assignee | ||
Comment 6•5 years ago
|
||
There are very few of these now. Here's a recent TV failure:
Assignee | ||
Comment 7•5 years ago
|
||
Comment 8•5 years ago
|
||
I noticed that it suffers multiple ADBTimeoutErrors through repeated initializations of ADBDevice. I think creating an ADBDevice once, then passing it to XPCShellRemote rather than doing it each time would be better. It also appears that running the test after the ADBTimeoutError to get the 10 iterations is guaranteed to fail once the device enters the ADBTimeoutError state. Can you terminate the test once you get a time out?
Assignee | ||
Comment 9•5 years ago
|
||
Thanks :bc.
The way I see it, the issue is that the mozharness logic for test verification (and per-test coverage) keeps going even after a test harness has failed in such a way that TBPL_RETRY would be triggered: eg. android_emulator_unittest calls the xpcshell harness, it fails with ADBTimeoutError, and then android_emulator_unittest calls the xpcshell harness again, for the next file...and the next...
I think that behavior is okay - possibly even desirable - for many failures, but not for ADBTimeoutError, and probably not for anything that would cause TBPL_RETRY (which I equate with, "there's an unrecoverable infrastructure problem").
Here I force persistent ADBTimeoutError in the Android xpcshell harness:
and we get quick task retries on Android; desktop continues just fine too.
I didn't bother updating android_hardware_unittest.py since you don't run test-verify there, and don't support task retries (right?)...but that would be easy to change, if you prefer.
Comment 10•5 years ago
|
||
Comment on attachment 9036020 [details] [diff] [review] abandon multi-test test-verification once TBPL_RETRY is hit Review of attachment 9036020 [details] [diff] [review]: ----------------------------------------------------------------- Let's do android_hardware_unittest.py as well since it does include the CodeCoverageMixin and therefore the SingleTestMixin. Might as well keep them in sync so if someone does try to run code coverage or test verification they won't be surprised. ::: testing/mozharness/scripts/android_emulator_unittest.py @@ +433,5 @@ > self.record_status(tbpl_status, level=log_level) > self.log_per_test_status(per_test_args[-1], tbpl_status, log_level) > + if tbpl_status == TBPL_RETRY: > + self.info("Per-test run abandoned due to RETRY status") > + break Do we want break here? Shouldn't we return False? ::: testing/mozharness/scripts/web_platform_tests.py @@ +446,5 @@ > if len(per_test_args) > 0: > self.log_per_test_status(per_test_args[-1], tbpl_status, log_level) > + if tbpl_status == TBPL_RETRY: > + self.info("Per-test run abandoned due to RETRY status") > + return Why is web_platform_tests.py inconsistent with the others in that it does not return a bool?
Assignee | ||
Comment 11•5 years ago
|
||
OK, android_hardware_unittest.py is included now.
In all of these scripts, run_tests() is a mozharness "step" function, which usually doesn't return a value. I've changed all the run_tests break and return statements to simple "return" (including modifying an existing, unrelated "return False").
That makes the change to desktop_unittest appear inconsistent, but there we are returning from _run_category_suites() -- which should return False.
Comment 12•5 years ago
|
||
Comment on attachment 9036066 [details] [diff] [review] abandon multi-test test-verification once TBPL_RETRY is hit Review of attachment 9036066 [details] [diff] [review]: ----------------------------------------------------------------- ::: testing/mozharness/scripts/desktop_unittest.py @@ +944,5 @@ > if len(per_test_args) > 0: > self.log_per_test_status(per_test_args[-1], tbpl_status, log_level) > + if tbpl_status == TBPL_RETRY: > + self.info("Per-test run abandoned due to RETRY status") > + return False Don't forget this one.
Assignee | ||
Comment 13•5 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #12)
Comment on attachment 9036066 [details] [diff] [review]
abandon multi-test test-verification once TBPL_RETRY is hitReview of attachment 9036066 [details] [diff] [review]:
::: testing/mozharness/scripts/desktop_unittest.py
@@ +944,5 @@if len(per_test_args) > 0: self.log_per_test_status(per_test_args[-1], tbpl_status, log_level)
if tbpl_status == TBPL_RETRY:
self.info("Per-test run abandoned due to RETRY status")
return False
Don't forget this one.
That one should return False: desktop_unittest.py has this code in a different function, _run_category_suites(), which needs to return a boolean.
Comment 14•5 years ago
|
||
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/500d91e30e9d Improve task retry handling in test-verify; r=bc
Comment 15•5 years ago
|
||
bugherder |
Assignee | ||
Comment 17•5 years ago
|
||
No new android-em failures in bug 1500266.
There is the remaining possibility of a task timeout interfering with the RETRY; I think that's best addressed by avoiding task timeouts / appropriate max-run-time settings.
Good enough for me!
Updated•5 years ago
|
Updated•3 years ago
|
Description
•