Closed Bug 1025401 Opened 10 years ago Closed 10 years ago

Update the expected value for the mozilla-central tree status alert

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: ashish)

References

Details

(Whiteboard: :Moc)

<nagios-releng>: Fri 18:22:45 PDT [4994] treestatus.mozilla.org:Tree status - mozilla-central is CRITICAL: status: approval required != open () (http://m.mozilla.org/Tree+status+-+mozilla-central)

That's been the new expected status for the mozilla-central tree since June 2nd, so it's been alerting since then other than a time or two that I've rage-acked it. Might not be the highest-value alert around.
I don't believe these alerts have much value. Trees are often (if not over 50% of the time) closed for non-infrastructure reasons. Until I have a chance to double back to bug 931542, there isn't a satisfactory way for these alerts to be tied to the infra-only reasons.

So for now, we should either:
a) Disable the alerts entirely until bug 931542 is fixed.
b) Leave the alert enabled, but set the acceptable states as "open" and "approval required" for *all trees*. Since approval required is rarely used for infrastructure reasons.

My preference is for (a).
Blocks: 931079
Flags: needinfo?(ashish)
acking it after every status change got boring, so I downtimed it for 100y.
I've downtimed the entire host, since we were still getting the pointless alerts for other trees (eg inbound):
downtime treestatus.mozilla.org 9999999d bug 1025401
The check was intended for IT to know when the trees were closed, not the sheriffs or buildduty. I've changed the alerting group to be the #sysadmins channel only.

I've also modified the check to set the default status of mozilla-central as "approval required"

Ashish: you may want to change the check to allow multi-value for expected state so one can look for EITHER "open" or "approval required" as acceptable states.
(In reply to Amy Rich [:arich] [:arr] from comment #4)
> The check was intended for IT to know when the trees were closed, not the
> sheriffs or buildduty. 

Yeah, but due to comment 1 I don't think it's currently useful for IT either, given that 90% of the alerts are noise from non-infra tree closures. Note also that this alert only goes off after a human has manually closed a tree (the trees can't close on their own due to other alerts etc) - so that human will normally file an IT bug if the problem is IT.

> I've changed the alerting group to be the #sysadmins
> channel only.
> 
> I've also modified the check to set the default status of mozilla-central as
> "approval required"

That's great - thank you :-)
Blocks: 993044
Whiteboard: :Moc
Flags: needinfo?(ashish)
Fixing this the "right way", as :arr mentioned in Comment 4...
Assignee: server-ops → ashish
Status: NEW → ASSIGNED
Alright, I've modified the script to check the output against regexes. So the check will not alert if mozilla-central's status is "open" or "approval required".
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.