Closed
Bug 878049
Opened 12 years ago
Closed 12 years ago
Create a persistent history of slave reboot attempts and outcomes for kittenherder
Categories
(Release Engineering :: General, defect, P2)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: coop, Assigned: coop)
References
Details
(Whiteboard: [slaveduty][dashboard][kittenherder])
kittenherder doesn't currently maintain a history of reboot attempts for a given slave, i.e. if the same slave appears in the slaves_needing_reboot.txt list 6 hours later, kittenherder will merrily try to reboot the slave again. This is a great opportunity for kittenherder to recognize a pattern in slave behavior and file an appropriate bug (bug 859403), but in order to do so, we need to start tracking reboot attempts in a persistent manner, and possibly also double-checking whether our reboot attempts were successful before waiting for the next cycle.
If we begin tracking state this way, it may allow us to iterate more quickly over the list of slaves needing reboot because we won't spend time on slaves that are in a known bad state.
If at all possible, the reboot history should be kept in a format (and location) that is easily digestible by other reporting tools, e.g. slave_health.
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → coop
Status: NEW → ASSIGNED
Priority: -- → P2
Comment 1•12 years ago
|
||
I believe this helps buildduty but I will remove the tag to get it out of the buildduty query.
Whiteboard: [buildduty][slaveduty][dashboard][kittenherder] → [slaveduty][dashboard][kittenherder]
Assignee | ||
Updated•12 years ago
|
Component: Release Engineering: Machine Management → Release Engineering: Developer Tools
QA Contact: armenzg → hwine
Assignee | ||
Comment 2•12 years ago
|
||
https://github.com/mozilla/briar-patch/commit/5c701aaa0361978d9e576e91a675aacec47c871d
It doesn't track outcomes, butI'm not sure how we would properly verify that unless we looped on slave state after a reboot attempt. Reboot commands can return success without actual yielding a functional machine out the other side.
We can track this based on subsequent reboot attempts though, especially if we start iterating more quickly than every 6 hours.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Updated•8 years ago
|
Component: Tools → General
You need to log in
before you can comment on or make changes to this bug.
Description
•