"WARNING: Unable to ping tegra after 5 attempts" should set RETRY

RESOLVED FIXED

Status

Release Engineering
General
P2
normal
RESOLVED FIXED
6 years ago
2 months ago

People

(Reporter: philor, Assigned: philor)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [sheriff-want])

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

6 years ago
A log_eval_func should be able to see "WARNING: Unable to ping tegra after 5 attempts" and retrigger for us, so we don't have to see bug 781419 10 or 15 times a day.

Updated

6 years ago
Whiteboard: [sheriff-want]
(Assignee)

Comment 1

6 years ago
Probably stuck behind a yak-shaving dependency chain. The reasonable place to stick the regex is in the broadly-named tegra_errors, but that includes a rather broad "Automation error: Error" -> FAILURE that might not go well with verify.py. Looking for things that might trip over shows that bug 790613 is touching messages to stick Automation Error in front of them, including this one, but it's stuck behind bug 781341.
Assignee: nobody → philringnalda
Status: NEW → ASSIGNED
Depends on: 790613
Priority: -- → P3
Whiteboard: [sheriff-want]

Updated

6 years ago
Whiteboard: [sheriff-want]
(Assignee)

Comment 2

6 years ago
Created attachment 661543 [details] [diff] [review]
quick and dirty

This would be prettier with the full message, and with the full knowledge of what messages some future sut_tools will produce to know whether it's safe to just add on to tegra_errors instead, but, I noticed that this is actually number 5 on http://brasstacks.mozilla.com/orangefactor/?display=OrangeFactor so I'd rather retry now than retry more prettily at some random unknowable future time.
Attachment #661543 - Flags: review?(bhearsum)
(Assignee)

Updated

6 years ago
Priority: P3 → P2
Comment on attachment 661543 [details] [diff] [review]
quick and dirty

NOTE: This will only empower the unittest jobs, talos jobs would need this elsewhere. -- if we do the whole "Automation Error: Unable..." we can stick this in the generic tegra_errors I think.

I'll let ben address the choices in the real patch, with my statements here as a rough guide.

Also I'm inclined to deploy Bug 790613 now if it would help this (takes moments to deploy) Its not really blocked behind 781341 at all, since 790613 already landed and is relatively easy to deploy (just manual for now)
Attachment #661543 - Flags: feedback+
(Assignee)

Comment 4

6 years ago
Yeah, I glossed over talos because it makes me stabby. It does addCleanupSteps(), ... run talos ... addCleanupSteps(), which means it runs verify.py twice, which makes us fail much more often in talos than we do in other suites, and makes me reluctant to set RETRY after a successful run. Not that I haven't done that before, mind you, but still, I'm reluctant to do it again.
(Assignee)

Comment 5

6 years ago
Created attachment 661834 [details] [diff] [review]
less dirty

This time reusing tegra_errors, by turning the existing one from what looks like it's some generic thing catching automation errors into what it really is, a specific message (and the only existing message that starts its message about what auomation error it is with the word error) that we have to turn to FAILURE because it happens during the test step, where it otherwise winds up orange instead of red.
Attachment #661543 - Attachment is obsolete: true
Attachment #661543 - Flags: review?(bhearsum)
Attachment #661834 - Flags: review?(bhearsum)
Attachment #661834 - Flags: review?(bhearsum) → review+
(Assignee)

Updated

6 years ago
Depends on: 792318
(Assignee)

Updated

6 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → General
Product: Release Engineering → Release Engineering
You need to log in before you can comment on or make changes to this bug.