Closed Bug 748563 Opened 13 years ago Closed 13 years ago

[briar-patch] Give concise, human-readable next steps for slaves needing recovery in the kitten emails

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: coop, Assigned: bear)

References

Details

(Whiteboard: [briarpatch][capacity][buildslaves][reporting])

I posted about Facebook's auto-remediation system back in the fall: https://www.facebook.com/notes/facebook-engineering/making-facebook-self-healing/10150275248698920 As much as possible, I want briar-patch to be working towards this goal. In almost all cases, we know what the next steps are that a human should take for a particular slave. At the very least, we know a first step, e.g. "Is this slave enabled in slavealloc?" Rather than display a list of previous states in the kitten emails, let's map those to a human action, and (where possible) provide a link for someone (buildduty) to get started performing that action. For example: I know that the kitten report is in some state of flux right now with the colo move, but two win64 slaves were consistently appearing in the "previously seen" category. By logging into those slaves, I was able to determine that auto-logon was not setup on these slaves, so buildbot was never getting the chance to start. That kind of information should live in a state matrix somewhere so the next time a win64 slave enters that state, we can tell buildduty (Please VNC into this host to make sure auto-logon is setup."). In short, I want the report to show me actionable work, with a link to the dashboard that will to allow me to drill-down and get to the info that the report currently displays.
I've made a first stab at HTML mail with some bug links here: https://github.com/ccooper/briar-patch/commit/ec362f277659a24e151dc239937a5bb26a3ea4eb
coop's changes have been merged and tested - doing a test run now on staging https://github.com/mozilla/briar-patch/commit/ca1a41b981b72d7382283b1da6054be3155ad948
Whiteboard: [briar-patch][capacity][buildslaves][reporting] → [briarpatch][capacity][buildslaves][reporting]
Blocks: 786712
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
Product: mozilla.org → Release Engineering
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.