Closed Bug 793091 Opened 12 years ago Closed 12 years ago

Treat all verify.py failures as cause for RETRY

Categories

(Release Engineering :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: philor)

References

(Blocks 1 open bug)

Details

(Whiteboard: [sheriff-want][mobile])

Attachments

(1 file, 1 obsolete file)

Having verify.py use the regex_log_evaluator to look for "unable to ping tegra" is vastly nicer than having no log_eval_func at all, but it's still far short of what we need. Given a custom function that can look at all of cmd.logs, not just line-by-line we could * set retry when the log is empty (bug 660480, because the process killed by signal 15 part isn't actually in the log so for verify.py, it's just an empty log) or when the trim()med log is empty (bug 686084, which appears to have some whitespace, but otherwise nothing when it hits verify.py) * set retry when the last line of the log is either "reconnecting socket" or "unable to connect socket" And probably other things I haven't yet thought of. It wouldn't get rid of all the Android pain, even from those three bugs since they're all capable of hitting in other buildsteps, but because verify.py does a pretty good job of being the step that fails, it would get the pain down to tolerable levels. Unfortunately, I don't think there's one in the tree close enough that I could copy it, so it's probably going to take someone who knows what they are doing.
Whiteboard: [sheriff-want]
Whiteboard: [sheriff-want] → [sheriff-want][mobile]
Attached patch from orbit (obsolete) — Splinter Review
Actually, grovelling around in the logs looking for particular phrases or particular whitespace would be silly. The point of verify.py is to poke at the tegra and see whether or not it's alive enough to go ahead with a test run on it. There are only two answers to that question, 0: yes and !0: find another tegra. More than usually, I wish I had access to a staging environment to test this in, since we don't have any current uses of the None == default thing, and I don't remember if we ever have, but it *looks* like it ought to work.
Attachment #663838 - Flags: review?(catlee)
Comment on attachment 663838 [details] [diff] [review] from orbit I'll throw this through staging tomorrow, it looks great though! (and is exactly what I had talked about wanting done for this step) (f? on myself for reminder to stage it)
Attachment #663838 - Flags: review?(catlee)
Attachment #663838 - Flags: review+
Attachment #663838 - Flags: feedback?(bugspam.Callek)
(In reply to Phil Ringnalda (:philor) from comment #1) > There are only two answers to that question, 0: yes and !0: find > another tegra. Simple yet incredibly awesome! :-D
Blocks: 778688
Tweaking summary to match patch. If at a later date we decide not to blanket RETRY, I'll file another bug for creating a custom log_eval_func.
Assignee: nobody → philringnalda
Status: NEW → ASSIGNED
Summary: verify.py needs a custom log_eval_func → Treat all verify.py failures as cause for RETRY
Comment on attachment 663838 [details] [diff] [review] from orbit Patch failed checkconfig, needed to add |RETRY| to line 77: from buildbot.status.builder import SUCCESS, FAILURE So simple, not denying review for that, but needs fixing before checkin
Like so? :-)
Attachment #663838 - Attachment is obsolete: true
Attachment #663838 - Flags: feedback?(bugspam.Callek)
Comment on attachment 664841 [details] [diff] [review] With review comment perfect, and works like a charm in staging.
Attachment #664841 - Flags: feedback+
Deployed :-)
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: