Closed Bug 940699 Opened 12 years ago Closed 12 years ago

Trigger a retry for metro talos suites that fail to initialize metro browser

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: jlund, Assigned: jlund)

References

Details

Attachments

(1 file, 2 obsolete files)

like with unittests, talos will use the harness retry error pattern: FAIL-SHOULD-RETRY and tell buildbot to retry if parsed 12 years ago Jordan Lund (:jlund) 2.95 KB, patch		Details \| Diff \| Splinter Review
bug_940699_trigger_retry_metro_fail.021213-4.diff 12 years ago Jordan Lund (:jlund) 2.87 KB, patch	mozilla : review+	Details \| Diff \| Splinter Review
bug_940699_trigger_retry_metro_fail-include-orange-return-code.031213.diff 12 years ago Jordan Lund (:jlund) 2.87 KB, patch	mozilla : review+ jlund : checked-in+	Details \| Diff \| Splinter Review

Jordan Lund (:jlund)

Assignee

Description

•

12 years ago

No description provided.

Joel Maher ( :jmaher ) (UTC -8)

Comment 1

•

12 years ago

in general I don't like this- I would love to figure out the root cause of this, but the time spent on this already hasn't got me there. If we can get an auto-retry added in, then we will be able to see other issues. How do we determine the retry? Is this a return code from Talos? A specific error message output that some log parser interprets? Please let me know if we need to adjust talos for this.

Jim Mathies [:jimm]

Comment 2

•

12 years ago

Any time slave config isn't working right, the harness will pick up the error dump a note in the output - 12:15:01 INFO - INFO | metrotestharness.exe | Launching browser... 12:15:01 INFO - INFO | metrotestharness.exe | App model id='E4CFE2E6B75AA3A3' 12:15:01 INFO - INFO | metrotestharness.exe | Harness process id: 3860 12:15:01 INFO - INFO | metrotestharness.exe | Using bin path: 'C:\slave\test\build\a.. 12:15:01 INFO - INFO | metrotestharness.exe | Writing out tests.ini to: 'C:\slave\.. 12:15:01 INFO - FAIL-SHOULD-RETRY | metrotestharness.exe | ActivateApplication result 80270254

Jordan Lund (:jlund)

Assignee

Comment 3

•

12 years ago

Attached patch like with unittests, talos will use the harness retry error pattern: FAIL-SHOULD-RETRY and tell buildbot to retry if parsed (obsolete) — Details — Splinter Review

I have tried painstackingly hard to reproduce the fail should retry scenario. I may have done over 20 sendchanges to no avail. However, it at least does not produce any mozharn errors. Aki, I don't think we used buildbot_status() for talos before, so this now produces log lines like: 12:08:32 INFO - # TBPL SUCCESS # I thought this would be OK and the only way to signal a retry?

Attachment #8341241 - Flags: review?(aki)

Jordan Lund (:jlund)

Assignee

Comment 4

•

12 years ago

s/painstackingly/painstakingly

Jordan Lund (:jlund)

Assignee

Comment 5

•

12 years ago

Attached patch bug_940699_trigger_retry_metro_fail.021213-4.diff (obsolete) — Details — Splinter Review

This fixes retry src code comments. There was some grammar/logic mistakes. The diff from previous is: https://github.com/lundjordan/mozharness/commit/dd243415f41637ae8af2bec288ede284a7bd7f8e

Attachment #8341241 - Attachment is obsolete: true

Attachment #8341241 - Flags: review?(aki)

Attachment #8341251 - Flags: review?(aki)

Aki Sasaki (not active)

Comment 6

•

12 years ago

Comment on attachment 8341251 [details] [diff] [review] bug_940699_trigger_retry_metro_fail.021213-4.diff >+ if self.return_code not in [0]: >+ # update the worst log level and tbpl status >+ parser.worst_log_level = parser.worst_level(ERROR, >+ parser.worst_log_level) >+ parser.worst_tbpl_status = parser.worst_level( >+ TBPL_FAILURE, parser.worst_tbpl_status, >+ levels=TBPL_WORST_LEVEL_TUPLE >+ ) >+ self.buildbot_status(parser.worst_tbpl_status, >+ level=parser.worst_log_level) It's not clear to me why we want to set these two variables (worst_{log_level,tbpl_status}) in the parser object, as opposed to two local variables. I don't think it's harmful at all, though.

Attachment #8341251 - Flags: review?(aki) → review+

Aki Sasaki (not active)

Comment 7

•

12 years ago

(In reply to Aki Sasaki [:aki] from comment #6) > It's not clear to me why we want to set these two variables > (worst_{log_level,tbpl_status}) in the parser object, as opposed to two > local variables. > I don't think it's harmful at all, though. N/m, I see those already exist. I should have checked that before replying :)

Aki Sasaki (not active)

Comment 8

•

12 years ago

(In reply to Jordan Lund (:jlund) from comment #3) > Created attachment 8341241 [details] [diff] [review] > like with unittests, talos will use the harness retry error pattern: > FAIL-SHOULD-RETRY and tell buildbot to retry if parsed > > I have tried painstackingly hard to reproduce the fail should retry > scenario. I may have done over 20 sendchanges to no avail. However, it at > least does not produce any mozharn errors. > > Aki, I don't think we used buildbot_status() for talos before, so this now > produces log lines like: > 12:08:32 INFO - # TBPL SUCCESS # > > I thought this would be OK and the only way to signal a retry? This depends on what's running the job. http://hg.mozilla.org/build/buildbotcustom/file/8756b09ddc1c/misc.py#l446 only looks at the exit code, not the string, which is the preferred way because * buildbot log parsing is expensive on the master * truncated logs will not show this string

Jordan Lund (:jlund)

Assignee

Comment 9

•

12 years ago

Attached patch bug_940699_trigger_retry_metro_fail-include-orange-return-code.031213.diff — Details — Splinter Review

as per our email thread, the same patch with 1(orange) included in list of viable return codes.

Attachment #8341251 - Attachment is obsolete: true

Attachment #8342605 - Flags: review?(aki)

Aki Sasaki (not active)

Updated

•

12 years ago

Attachment #8342605 - Flags: review?(aki) → review+

Jordan Lund (:jlund)

Assignee

Comment 10

•

12 years ago

pushed to default: https://hg.mozilla.org/build/mozharness/rev/1913406e2d96

Jordan Lund (:jlund)

Assignee

Updated

•

12 years ago

Attachment #8342605 - Flags: checked-in+

bhearsum@mozilla.com (:bhearsum)

Comment 11

•

12 years ago

In production

Jordan Lund (:jlund)

Assignee

Comment 12

•

12 years ago

I think this is resolved. the fail-should-retry was parsed and looks like it persisted over the return code and triggered a retry: https://tbpl.mozilla.org/php/getParsedLog.php?id=31752089&tree=Cedar

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Jim Mathies [:jimm]

Comment 13

•

12 years ago

(In reply to Jordan Lund (:jlund) from comment #12) > I think this is resolved. the fail-should-retry was parsed and looks like it > persisted over the return code and triggered a retry: > https://tbpl.mozilla.org/php/getParsedLog.php?id=31752089&tree=Cedar Hmm, don't think this is going to accomplish what we want. With mochitests, the retry happens on a new slave, so the test runs can complete. Looks like we are retrying on the same slave three times then bailing? That won't help since it's the slaves startup browser config that's hosed.

Jordan Lund (:jlund)

Assignee

Comment 14

•

12 years ago

> Hmm, don't think this is going to accomplish what we want. With mochitests, > the retry happens on a new slave, so the test runs can complete. Looks like > we are retrying on the same slave three times then bailing? That won't help > since it's the slaves startup browser config that's hosed. Shoot. I'll look into this more. I must be missing something. I'll explain what I see so we are on the same page for sure: from what I can tell in the log I sent, there is a 'retry' attempt on the metroharness launch three times. I think we have always done this (but it was never highlighted)? What I was referring to in that log is how I am parsing those "FAIL-SHOULD-RETRY" lines and marking them as "Critical". I then add log output at the end: "# TBPL RETRY #". Before, we had jobs that were red in tbpl for this scenario: https://tbpl.mozilla.org/php/getParsedLog.php?id=31510103&tree=Cedar Now when we look at tbpl, this shows that the buildbot build is signaled as a retry, and another build is started. So if you look here: https://tbpl.mozilla.org/?tree=Cedar&jobname=svgr-metro In the above, there is a blue (critical, retry) build recently and a green (success) one beside it in the same cedar push. Is this what you are seeing and I am still mistaken or missing something?

Jim Mathies [:jimm]

Comment 15

•

12 years ago

ah, my mistake, all I did was look at the log. tbpl looks right!

Nobody; OK to take it and work on it

Updated

•

7 years ago

Component: General Automation → General

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Trigger a retry for metro talos suites that fail to initialize metro browser

Categories

(Release Engineering :: General, defect)

Tracking

(Not tracked)

People

(Reporter: jlund, Assigned: jlund)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 2 obsolete files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Comment 10

Updated

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Updated

Attachment

General

Description

File Name

Content Type