Closed Bug 781159 Opened 13 years ago Closed 9 years ago

mark tegras as dead when they timeout on 2 or more steps during an automation run

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: jmaher, Unassigned)

References

Details

(Whiteboard: [tegra])

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Description

•

13 years ago

In our automation we continue to have a lot of reds in our automation. One of the highest frequency problems are when we have a device which is truly dead and we continue to schedule jobs on it. In fact, 20 out of 51 red jobs that I looked at on Monday were two devices that we continually scheduled jobs on. My proposal here is if we hit the timeout on 2 or more steps we push the tegra into a dead pool which we can then manually or automatically try to remediate. By a timeout, I don't mean fail, this is the really long maximum time hit. verify.py: 1200 seconds reboot.py: 1800 seconds mochitest/reftest: 2400 seconds Since these timeout and are terminated by buildbot, I am unable to detect these in the harness or in things like sut_tools.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Updated

•

13 years ago

Blocks: 781162

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Updated

•

13 years ago

Blocks: mobile-automation

Justin Wood (:Callek)

Comment 1

•

13 years ago

Unfortunately I don't know a good way to identify this in our normal/current automation paths. Dustin, is there a way to identify, with buildbot "previous step(s) timed out" even if we have to check with specific tests, or some other way to do this with buildbot. For clarity, the current way to take jobs out of production for tegras, is by creating a file (error.flg) on the foopy, the file should contain a human-readable string that identifies why it has an error.

Dustin J. Mitchell [:dustin] (he/him)

Comment 2

•

13 years ago

A status listener could do this - think of how MailNotifier works.

Chris AtLee [:catlee]

Updated

•

13 years ago

Priority: -- → P3

Whiteboard: [tegra]

Nobody; OK to take it and work on it

Assignee

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

Justin Wood (:Callek)

Comment 3

•

9 years ago

Both tegras and pandas are dead.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → WONTFIX

Nobody; OK to take it and work on it

Assignee

Updated

•

7 years ago

Component: General Automation → General

You need to log in before you can comment on or make changes to this bug.

Bugzilla

mark tegras as dead when they timeout on 2 or more steps during an automation run

Categories

(Release Engineering :: General, defect, P3)

Tracking

(Not tracked)

People

(Reporter: jmaher, Unassigned)

References

Details

(Whiteboard: [tegra])

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Updated

Updated

Comment 3

Updated