Task retry does not work for Android 7.0 x86 wpt tasks

RESOLVED FIXED in Firefox -esr60

Status

defect
P1
normal
RESOLVED FIXED
9 months ago
7 months ago

People

(Reporter: gbrown, Assigned: gbrown)

Tracking

Version 3
mozilla65
Points:
---

Firefox Tracking Flags

(firefox-esr60 fixed, firefox65 fixed)

Details

Attachments

(1 attachment)

Android 4.3 tasks automatically retry on ADBTimeoutError and ADBError but I haven't seen that mechanism working on Android 7.0 x86. Also, kwierso pointed out 

https://treeherder.mozilla.org/logviewer.html#?job_id=210771442&repo=try
https://treeherder.mozilla.org/logviewer.html#?job_id=210868661&repo=try

I think at least the first one should have retried.
The task definition looks correct:

https://tools.taskcluster.net/groups/ctottUVpQpCq66CxrwCK0g/tasks/N0q8YhF1QzqILO88tjndbQ/details

  "onExitStatus": {
    "purgeCaches": [
      72
    ],
    "retry": [
      4,
      72
    ]
  },
Works for mochitest regardless of platform:

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=6d8491a77cd5b8b261d5554d22e65e71391e3871

Maybe different output parser for wpt?
(In reply to Geoff Brown [:gbrown] from comment #2)
> Maybe different output parser for wpt?

Yes, that seems to be the issue. The other Android test tasks use DesktopUnittestOutputParser, which checks the retry regex; Android wpt uses StructuredOutputParser, which does not check that regex.
See Also: → 1507560
Priority: -- → P1
Summary: Verify task retry for Android 7.0 x86 tasks → Task retry does not work for Android 7.0 x86 wpt tasks
The DesktopUnittestOutputParser already supports TBPL_RETRY. There's no convenient way to move support to the base class which doesn't know about self.tbpl_status and is currently free of higher level mozharness dependencies.

In addition to detecting retry conditions and setting tbpl_status, we need to be careful not to clobber TBPL_RETRY when doing the summary processing required for TV and similar runs.

Here's an example with retry forced by an ADBError:
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=0009814cdd5e1401f9b7a2f74fe9027730940920

and another try run to check that normal success/failure status is generally okay:
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=5ad175d3d559319c6a7624eaf2b638f89be290a1
Attachment #9026576 - Flags: review?(jmaher)
Comment on attachment 9026576 [details] [diff] [review]
support TBPL_RETRY in structured logging output parser

Review of attachment 9026576 [details] [diff] [review]:
-----------------------------------------------------------------

we only have to update structuredlog.py ?
Attachment #9026576 - Flags: review?(jmaher) → review+
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #6)
> we only have to update structuredlog.py ?

Yes, I think so.
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/09abaef41d5e
Support TBPL_RETRY in structured logger, to enable task retry; r=jmaher
https://hg.mozilla.org/mozilla-central/rev/09abaef41d5e
Status: NEW → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Should this have made the `ERROR - adb get_process_list:` failures trigger a retry, or was comment 0 just kinda hoping it might have been fixed along with the `ls could not be found` case?

Because I rebased onto the merge to m-c that included this patch and I'm still getting failures like https://treeherder.mozilla.org/logviewer.html#?job_id=213332123&repo=try that aren't retrying.
And this one hit the get_process_list error twice in under a second: https://treeherder.mozilla.org/logviewer.html#?job_id=213332149&repo=try&lineNumber=2532
Retry will be triggered by "ADBError" or "ADBTimeoutError" in the log, but not this strange get_process_list error -- I filed bug 1509324 for that and will follow up.
You need to log in before you can comment on or make changes to this bug.