Closed Bug 789058 Opened 12 years ago Closed 12 years ago

New script for Nagios checking database checksums

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
macOS
task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: scabral, Assigned: ashish)

References

Details

Attachments

(1 file, 2 obsolete files)

Attached file updated check_table_checksums (obsolete) —
I have modified the code at:
https://github.com/palominodb/PalominoDB-Public-Code-Repository/blob/master/nagios/table_checksums/check_table_checksums.pl

To work with updated versions of pt-table-checksum, and to ignore system databases. Specifically, I have:

- taken "host" out of the SELECT statements for the checksum, as host is not captured by default from pt-table-checksum


- added an option, --ignore-sys or -b, to ignore system databases (hard-coded as mysql, INFORMATION_SCHEMA, PERFORMANCE_SCHEMA).
  *** Ideally this would be changed to options like --ignore-dbs and --ignore-tbls, which would have comma-separated lists of dbs to ignore and tables to ignore, but that's a 'nice-to-have' and perhaps PalominoDB will do it upstream after we send these patches back to them. (but if anyone has time, I'm happy to have those features, in which case --ignore-sys becomes --ignore-dbs mysql,INFORMATION_SCHEMA,PERFORMANCE_SCHEMA


If you just want to see the diff, you can download the original script at https://raw.github.com/palominodb/PalominoDB-Public-Code-Repository/master/nagios/table_checksums/check_table_checksums.pl to compare.
tl;dr: Looks fine to me, nothing glaring, and I'm not going to nitpick style.

Line 13 says unknown returns 2.  It should return 3, but that's an upstream bug.

Line numbers refer to the upgraded script.
Nitpicking: Line 144 and 158 print your and_clause variable, which could make this chattier than necessary, particularly on 144 where it's in a newline.  Since it's a nagios check, you might not ever see it, so if it's important, you might need to shuffle the failure case printout around.

If you wanted to do ignore dbs as a customizable variable, the secret sauce is up around line 38, where they say vArNaMe=s, meaning varname expects a 's'tring associated with it.  That's easy.  The tougher part, of course, is to let Little Bobby Tables check the inputs.  They don't seem to be doing that anyway on the --table variable, so, maybe they're considering it skippable.
Thanx about Line 13 :D um, 2 = critical right?

As for the chattiness, the idea is to make sure when people see "OK" or "WARNING/CRITICAL" they know that the check didn't include those DBs. So OK might be "OK but we didn't check x,y,z"

Is there a magic letter for a comma-separated list and an easy way to go through each one to quote it? e.g.

--ignore-dbs a,b,c 

needs to turn into

AND db not in ('a','b','c')

So that's the hard part IMO.

I'm not worried about SQL injection, because this is a nagios check, and if you can screw around with our nagios, it's over anyway.
'2' is critical.

OK, ignore my chattiness concern.  Hadn't realized the one-line-only limit was lifted.

As to option processing for DB stuff, stealing from http://perldoc.perl.org/Getopt/Long.html#Options-with-multiple-values :

gcox@fibbsbozza:~$ ./test.pl 
gcox@fibbsbozza:~$ ./test.pl --ignore-db foo --ignore-db bar --ignore-db baz
 AND db not in ('foo','bar','baz')
gcox@fibbsbozza:~$ ./test.pl --ignore-db foo,bar --ignore-db baz
 AND db not in ('foo','bar','baz')
gcox@fibbsbozza:~$ ./test.pl --ignore-db foo,bar,baz
 AND db not in ('foo','bar','baz')

gcox@fibbsbozza:~$ cat test.pl 
#!/usr/bin/perl -w
use strict;
use Getopt::Long;

my @ignore_dbs = ();
GetOptions ("ignore-dbs=s" => \@ignore_dbs);
@ignore_dbs = split(/,/, join(',',@ignore_dbs));
my $and_clause = @ignore_dbs ? ' AND db not in ('.join(',',map {"'$_'"} @ignore_dbs).')' : '';

print $and_clause."\n" if ($and_clause);
Well, that's good. It's a bit more work, but it's good work for tomorrow's "no change Friday"
OK, I took out the ignore-sys and put in an ignore-db option. Anything glaringly wrong here? I'm attaching the new file, and here's the diff from the previous version.

[root@tp-bugs01-master01 bin]# diff working.pl check_table_checksums.pl 
26,27d25
< my $ignore_sys = 0;
< 
30c28
< my $ignore_dbs="'mysql','INFORMATION_SCHEMA','PERFORMANCE_SCHEMA'";
---
> my $ignore_dbs='';
42c40
<   'ignore-sys|b' => \$ignore_sys,
---
>   'ignore-db|b=s' => \$ignore_dbs,
59c57
<   --ignore-sys,-b Ignore system databases/tables like mysql, INFORMATION_SCHEMA, etc. 
---
>   --ignore-db,-b Ignore databases 
76,77c74,78
< $and_clause = $ignore_sys ? " AND db not in ($ignore_dbs) " : "" ;
< #print "$ignore_sys is ignore-sys\n $and_clause = and clause";
---
> if ($ignore_dbs) {
>   $and_clause .= "AND db NOT IN ('";
>   $and_clause .=join("','",split(/,/,$ignore_dbs));
>   $and_clause .= "')";
> }
All I have is nitpicking  (Maybe better help on the syntax of the -b option.  Maybe a FIXME comment for the future that the input isn't injection-sanitized).

I say "Ship it."
Help now says:
   --ignore-db,-b Ignore these databases (comma separated list)

And I changed 
use constant UNKNOWN  => 2;
to
use constant UNKNOWN  => 3;
Attachment #658880 - Attachment is obsolete: true
Attachment #659263 - Attachment is obsolete: true
Please put the attached check into Nagios:

It should be run with the following options:

check_table_checksums.pl --user nagiosdaemon --password **ELIDED** -T percona.checksums -I 24 -H $HOSTNAME$ -b mysql,INFORMATION_SCHEMA,PERFORMANCE_SCHEMA

At first this should be run on against the production phoenix bugzilla servers:

tp-bugs01-master01.phx.mozilla.com
tp-bugs01-slave01.phx.mozilla.com
tp-bugs01-slave02.phx.mozilla.com
tp-bugs01-slave03.phx.mozilla.com

I would make a group (service group?) called mysql-checksum to put this in, because we're going to be adding more machines in the future.

This check should e-mail infra-dbnotices, but NOT page. It can be run every few hours; there's no need to run it every 5 minutes.

(please don't add this check on a Friday)
Summary: Verify perl script to check checksums on slaves in Nagios → New script for Nagios checking database checksums
Upping the importance, this is the last step in a q3 goal for the DB team.
Severity: normal → major
Assignee: server-ops → ashish
Added these to Nagios:

https://nagios.mozilla.org/phx1/cgi-bin/status.cgi?hostgroup=mysql-checksum&style=detail
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Please make sure that:

-b mysql,INFORMATION_SCHEMA,PERFORMANCE_SCHEMA,percona

is in the check (this should be configurable, but probably won't change a ton)

and

This check should e-mail infra-dbnotices, but NOT page. It can be run every few hours; there's no need to run it every 5 minutes.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
all better, thanx for fixing ashish!
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: