Closed
Bug 940699
Opened 11 years ago
Closed 11 years ago
Trigger a retry for metro talos suites that fail to initialize metro browser
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jlund, Assigned: jlund)
References
Details
Attachments
(1 file, 2 obsolete files)
2.87 KB,
patch
|
mozilla
:
review+
jlund
:
checked-in+
|
Details | Diff | Splinter Review |
No description provided.
Comment 1•11 years ago
|
||
in general I don't like this- I would love to figure out the root cause of this, but the time spent on this already hasn't got me there. If we can get an auto-retry added in, then we will be able to see other issues. How do we determine the retry? Is this a return code from Talos? A specific error message output that some log parser interprets? Please let me know if we need to adjust talos for this.
Comment 2•11 years ago
|
||
Any time slave config isn't working right, the harness will pick up the error dump a note in the output - 12:15:01 INFO - INFO | metrotestharness.exe | Launching browser... 12:15:01 INFO - INFO | metrotestharness.exe | App model id='E4CFE2E6B75AA3A3' 12:15:01 INFO - INFO | metrotestharness.exe | Harness process id: 3860 12:15:01 INFO - INFO | metrotestharness.exe | Using bin path: 'C:\slave\test\build\a.. 12:15:01 INFO - INFO | metrotestharness.exe | Writing out tests.ini to: 'C:\slave\.. 12:15:01 INFO - FAIL-SHOULD-RETRY | metrotestharness.exe | ActivateApplication result 80270254
Assignee | ||
Comment 3•11 years ago
|
||
I have tried painstackingly hard to reproduce the fail should retry scenario. I may have done over 20 sendchanges to no avail. However, it at least does not produce any mozharn errors. Aki, I don't think we used buildbot_status() for talos before, so this now produces log lines like: 12:08:32 INFO - # TBPL SUCCESS # I thought this would be OK and the only way to signal a retry?
Attachment #8341241 -
Flags: review?(aki)
Assignee | ||
Comment 4•11 years ago
|
||
s/painstackingly/painstakingly
Assignee | ||
Comment 5•11 years ago
|
||
This fixes retry src code comments. There was some grammar/logic mistakes. The diff from previous is: https://github.com/lundjordan/mozharness/commit/dd243415f41637ae8af2bec288ede284a7bd7f8e
Attachment #8341241 -
Attachment is obsolete: true
Attachment #8341241 -
Flags: review?(aki)
Attachment #8341251 -
Flags: review?(aki)
Comment 6•11 years ago
|
||
Comment on attachment 8341251 [details] [diff] [review] bug_940699_trigger_retry_metro_fail.021213-4.diff >+ if self.return_code not in [0]: >+ # update the worst log level and tbpl status >+ parser.worst_log_level = parser.worst_level(ERROR, >+ parser.worst_log_level) >+ parser.worst_tbpl_status = parser.worst_level( >+ TBPL_FAILURE, parser.worst_tbpl_status, >+ levels=TBPL_WORST_LEVEL_TUPLE >+ ) >+ self.buildbot_status(parser.worst_tbpl_status, >+ level=parser.worst_log_level) It's not clear to me why we want to set these two variables (worst_{log_level,tbpl_status}) in the parser object, as opposed to two local variables. I don't think it's harmful at all, though.
Attachment #8341251 -
Flags: review?(aki) → review+
Comment 7•11 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #6) > It's not clear to me why we want to set these two variables > (worst_{log_level,tbpl_status}) in the parser object, as opposed to two > local variables. > I don't think it's harmful at all, though. N/m, I see those already exist. I should have checked that before replying :)
Comment 8•11 years ago
|
||
(In reply to Jordan Lund (:jlund) from comment #3) > Created attachment 8341241 [details] [diff] [review] > like with unittests, talos will use the harness retry error pattern: > FAIL-SHOULD-RETRY and tell buildbot to retry if parsed > > I have tried painstackingly hard to reproduce the fail should retry > scenario. I may have done over 20 sendchanges to no avail. However, it at > least does not produce any mozharn errors. > > Aki, I don't think we used buildbot_status() for talos before, so this now > produces log lines like: > 12:08:32 INFO - # TBPL SUCCESS # > > I thought this would be OK and the only way to signal a retry? This depends on what's running the job. http://hg.mozilla.org/build/buildbotcustom/file/8756b09ddc1c/misc.py#l446 only looks at the exit code, not the string, which is the preferred way because * buildbot log parsing is expensive on the master * truncated logs will not show this string
Assignee | ||
Comment 9•11 years ago
|
||
as per our email thread, the same patch with 1(orange) included in list of viable return codes.
Attachment #8341251 -
Attachment is obsolete: true
Attachment #8342605 -
Flags: review?(aki)
Updated•11 years ago
|
Attachment #8342605 -
Flags: review?(aki) → review+
Assignee | ||
Comment 10•11 years ago
|
||
pushed to default: https://hg.mozilla.org/build/mozharness/rev/1913406e2d96
Assignee | ||
Updated•11 years ago
|
Attachment #8342605 -
Flags: checked-in+
Comment 11•11 years ago
|
||
In production
Assignee | ||
Comment 12•11 years ago
|
||
I think this is resolved. the fail-should-retry was parsed and looks like it persisted over the return code and triggered a retry: https://tbpl.mozilla.org/php/getParsedLog.php?id=31752089&tree=Cedar
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 13•11 years ago
|
||
(In reply to Jordan Lund (:jlund) from comment #12) > I think this is resolved. the fail-should-retry was parsed and looks like it > persisted over the return code and triggered a retry: > https://tbpl.mozilla.org/php/getParsedLog.php?id=31752089&tree=Cedar Hmm, don't think this is going to accomplish what we want. With mochitests, the retry happens on a new slave, so the test runs can complete. Looks like we are retrying on the same slave three times then bailing? That won't help since it's the slaves startup browser config that's hosed.
Assignee | ||
Comment 14•11 years ago
|
||
> Hmm, don't think this is going to accomplish what we want. With mochitests, > the retry happens on a new slave, so the test runs can complete. Looks like > we are retrying on the same slave three times then bailing? That won't help > since it's the slaves startup browser config that's hosed. Shoot. I'll look into this more. I must be missing something. I'll explain what I see so we are on the same page for sure: from what I can tell in the log I sent, there is a 'retry' attempt on the metroharness launch three times. I think we have always done this (but it was never highlighted)? What I was referring to in that log is how I am parsing those "FAIL-SHOULD-RETRY" lines and marking them as "Critical". I then add log output at the end: "# TBPL RETRY #". Before, we had jobs that were red in tbpl for this scenario: https://tbpl.mozilla.org/php/getParsedLog.php?id=31510103&tree=Cedar Now when we look at tbpl, this shows that the buildbot build is signaled as a retry, and another build is started. So if you look here: https://tbpl.mozilla.org/?tree=Cedar&jobname=svgr-metro In the above, there is a blue (critical, retry) build recently and a green (success) one beside it in the same cedar push. Is this what you are seeing and I am still mistaken or missing something?
Comment 15•11 years ago
|
||
ah, my mistake, all I did was look at the log. tbpl looks right!
Updated•6 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•