Closed Bug 797324 Opened 12 years ago Closed 12 years ago

Find specific "talosError:" cases that aren't easily caused on Try & can be set to RETRY

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: emorley, Assigned: emorley)

References

(Blocks 1 open bug)

Details

(Whiteboard: [sheriff-want])

Attachments

(1 file, 1 obsolete file)

Indicates a fatal talos run failure, so we didn't get any results and should RETRY.
No longer blocks: 794895
I have a vague recollection of someone (maybe philor?) not wanting to do this. In any case, it's easy to do by adding it to the list of global errors: https://github.com/mozilla/buildbotcustom/blob/master/status/errors.py#L5
Attached patch Patch v1 (obsolete) — Splinter Review
Attachment #667437 - Flags: review?(bhearsum)
(In reply to Ben Hearsum [:bhearsum] from comment #1) > I have a vague recollection of someone (maybe philor?) not wanting to do > this. In any case, it's easy to do by adding it to the list of global > errors: > https://github.com/mozilla/buildbotcustom/blob/master/status/errors.py#L5 Sorry didn't get a mid-air on this. Would you prefer the global list or the one I added in the patch? philor is CCed and I'll make sure he agrees before landing.
Attached patch Patch v2Splinter Review
Adding to the general errors list, since bhearsum mentioned that adding log_eval_func overrides the general list and we still want those on talos (and talosError won't come up anywhere else).
Attachment #667437 - Attachment is obsolete: true
Attachment #667437 - Flags: review?(bhearsum)
Attachment #667463 - Flags: review?(bhearsum)
Attachment #667463 - Flags: review?(bhearsum) → review+
Comment on attachment 667463 [details] [diff] [review] Patch v2 With mobile and their... ways, the first thing you have to ask yourself while setting RETRY is "is there any way on earth that a patch can cause this condition (because if so a patch will cause it, and I'll have 200 jobs retrying all at once)?" A patch that causes the browser to not start up presents as "talosError: 'failed to initialize browser'" and in fact the two logs in that bug from yesterday are how long it took me to realize I was starring a completely busted push, a completely busted push that stayed in the tree for probably 20 or 30 pushes. There probably are individual talosErrors that can retry because they can't be caused by a patch, though given how the ateam tests talos (and the other harness bits that can be tested there because they are in m-c) with try pushes, that's risky too - I killed someone's push on try this weekend because it added a devicemanager.DMError to robocop, so it had retried 30 times by the time I saw it. Even if you figure this is okay because the retries will coalesce and we'll just kill the running jobs on the push before we backed out bustage, you still have to consider that this will cause infinite retries on unnoticed try pushes from someone breaking mobile and not understanding that their push will continue to run forever.
Attachment #667463 - Flags: feedback-
Ah, good point; I should have thought of that :-/ I'll try looking into specific cases (like you say), but I'm guessing there won't be many of them (other than the already filed bug 716800).
Assignee: bmo → nobody
Summary: "talosError:" should set RETRY → Find specific "talosError:" cases that aren't easily caused on Try & can be set to RETRY
Status: ASSIGNED → NEW
Given comment 5, the fact that this was just wallpaper, and that we can now star talosErrors so they are less of a burden; WONTFIX.
Assignee: nobody → bmo
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
Blocks: 816584
Blocks: 829371
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: