Closed Bug 1326233 Opened 7 years ago Closed 6 years ago

nagios_blocker_checker.pl doesn't fail NRPE gracefully with bad inputs.

Categories

(bugzilla.mozilla.org :: General, defect, P5)

Production

Tracking

()

RESOLVED FIXED

People

(Reporter: gcox, Unassigned)

Details

https://github.com/mozilla-bteam/bmo/blob/master/contrib/nagios_blocker_checker.pl

================================
$ sudo /data/bugzilla/www/bugzilla.mozilla.org/scripts/nagios_blocker_checker.pl --product 'Infrastructure & Operations' --component 'MOC: Incidents' --severity blocker

There is no component named 'MOC: Incidents' in the 'Infrastructure &
Operations' product.
================================

This causes "NRPE: Unable to read output" in nagios because the check's returned text is not properly formatted, rather than something you can quickly figure out.

My quickie diagnosis is, this appears to be a by-product of the script calling Bugzilla::Product->check, leading to Bugzilla::Error->ThrowUserError with a pretty generic web-friendly-but-nagios-unfriendly template.  Which is about the time I start saying "I have no graceful patch here without breaking the API into a lot of different calls", and seeing if you do.
The fix for this is not too bad:

Bugzilla->error_mode(ERROR_MODE_DIE) will cause that to be thrown as a real exception, which can be caught with try, something like:

# after https://github.com/mozilla-bteam/bmo/blob/master/contrib/nagios_blocker_checker.pl#L20
use Try::Tiny; # bmo ships with this nowadays
Bugzilla->error_mode(ERROR_MODE_DIE);

try {
# all lines from 
# https://github.com/mozilla-bteam/bmo/blob/master/contrib/nagios_blocker_checker.pl#L119-L196
} catch {
# print meaningful output. 
};
Offered up a patch in https://github.com/mozilla-bteam/bmo/pull/326
Aaaand, :dylan merged that PR.  Since this bug was future-prevention and issue only crops up when we have a change in BMO component/products AND we have none planned in the areas IT are watching, I'm calling this done (even though presently it's only upstream-committed, not deployed-on-the-admin-host), as it'll get to us eventually on a future deploy, and we shouldn't notice it anyway.

Thanks!
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.