Last Comment Bug 789058 - New script for Nagios checking database checksums
: New script for Nagios checking database checksums
Product: Graveyard
Classification: Graveyard
Component: Server Operations (show other bugs)
: other
: x86 Mac OS X
: -- major (vote)
: ---
Assigned To: Ashish Vijayaram [:ashish]
: Justin Dow [:jabba]
Depends on:
Blocks: 774162
  Show dependency treegraph
Reported: 2012-09-06 07:52 PDT by Sheeri Cabral [:sheeri]
Modified: 2015-03-12 08:17 PDT (History)
3 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---

updated check_table_checksums (4.56 KB, text/x-perl-script)
2012-09-06 07:52 PDT, Sheeri Cabral [:sheeri]
no flags Details
updated version, can take a list of dbs to igore (4.45 KB, text/x-perl-script)
2012-09-07 08:11 PDT, Sheeri Cabral [:sheeri]
no flags Details
fixed UNKNOWN to be status 3, added better help (4.47 KB, text/x-perl-script)
2012-09-07 10:46 PDT, Sheeri Cabral [:sheeri]
no flags Details

Description Sheeri Cabral [:sheeri] 2012-09-06 07:52:14 PDT
Created attachment 658880 [details]
updated check_table_checksums

I have modified the code at:

To work with updated versions of pt-table-checksum, and to ignore system databases. Specifically, I have:

- taken "host" out of the SELECT statements for the checksum, as host is not captured by default from pt-table-checksum

- added an option, --ignore-sys or -b, to ignore system databases (hard-coded as mysql, INFORMATION_SCHEMA, PERFORMANCE_SCHEMA).
  *** Ideally this would be changed to options like --ignore-dbs and --ignore-tbls, which would have comma-separated lists of dbs to ignore and tables to ignore, but that's a 'nice-to-have' and perhaps PalominoDB will do it upstream after we send these patches back to them. (but if anyone has time, I'm happy to have those features, in which case --ignore-sys becomes --ignore-dbs mysql,INFORMATION_SCHEMA,PERFORMANCE_SCHEMA

If you just want to see the diff, you can download the original script at to compare.
Comment 1 Greg Cox [:gcox] 2012-09-06 08:23:51 PDT
tl;dr: Looks fine to me, nothing glaring, and I'm not going to nitpick style.

Line 13 says unknown returns 2.  It should return 3, but that's an upstream bug.

Line numbers refer to the upgraded script.
Nitpicking: Line 144 and 158 print your and_clause variable, which could make this chattier than necessary, particularly on 144 where it's in a newline.  Since it's a nagios check, you might not ever see it, so if it's important, you might need to shuffle the failure case printout around.

If you wanted to do ignore dbs as a customizable variable, the secret sauce is up around line 38, where they say vArNaMe=s, meaning varname expects a 's'tring associated with it.  That's easy.  The tougher part, of course, is to let Little Bobby Tables check the inputs.  They don't seem to be doing that anyway on the --table variable, so, maybe they're considering it skippable.
Comment 2 Sheeri Cabral [:sheeri] 2012-09-06 09:25:53 PDT
Thanx about Line 13 :D um, 2 = critical right?

As for the chattiness, the idea is to make sure when people see "OK" or "WARNING/CRITICAL" they know that the check didn't include those DBs. So OK might be "OK but we didn't check x,y,z"

Is there a magic letter for a comma-separated list and an easy way to go through each one to quote it? e.g.

--ignore-dbs a,b,c 

needs to turn into

AND db not in ('a','b','c')

So that's the hard part IMO.

I'm not worried about SQL injection, because this is a nagios check, and if you can screw around with our nagios, it's over anyway.
Comment 3 Greg Cox [:gcox] 2012-09-06 09:45:40 PDT
'2' is critical.

OK, ignore my chattiness concern.  Hadn't realized the one-line-only limit was lifted.

As to option processing for DB stuff, stealing from :

gcox@fibbsbozza:~$ ./ 
gcox@fibbsbozza:~$ ./ --ignore-db foo --ignore-db bar --ignore-db baz
 AND db not in ('foo','bar','baz')
gcox@fibbsbozza:~$ ./ --ignore-db foo,bar --ignore-db baz
 AND db not in ('foo','bar','baz')
gcox@fibbsbozza:~$ ./ --ignore-db foo,bar,baz
 AND db not in ('foo','bar','baz')

gcox@fibbsbozza:~$ cat 
#!/usr/bin/perl -w
use strict;
use Getopt::Long;

my @ignore_dbs = ();
GetOptions ("ignore-dbs=s" => \@ignore_dbs);
@ignore_dbs = split(/,/, join(',',@ignore_dbs));
my $and_clause = @ignore_dbs ? ' AND db not in ('.join(',',map {"'$_'"} @ignore_dbs).')' : '';

print $and_clause."\n" if ($and_clause);
Comment 4 Sheeri Cabral [:sheeri] 2012-09-06 10:41:43 PDT
Well, that's good. It's a bit more work, but it's good work for tomorrow's "no change Friday"
Comment 5 Sheeri Cabral [:sheeri] 2012-09-07 08:10:12 PDT
OK, I took out the ignore-sys and put in an ignore-db option. Anything glaringly wrong here? I'm attaching the new file, and here's the diff from the previous version.

[root@tp-bugs01-master01 bin]# diff 
< my $ignore_sys = 0;
< my $ignore_dbs="'mysql','INFORMATION_SCHEMA','PERFORMANCE_SCHEMA'";
> my $ignore_dbs='';
<   'ignore-sys|b' => \$ignore_sys,
>   'ignore-db|b=s' => \$ignore_dbs,
<   --ignore-sys,-b Ignore system databases/tables like mysql, INFORMATION_SCHEMA, etc. 
>   --ignore-db,-b Ignore databases 
< $and_clause = $ignore_sys ? " AND db not in ($ignore_dbs) " : "" ;
< #print "$ignore_sys is ignore-sys\n $and_clause = and clause";
> if ($ignore_dbs) {
>   $and_clause .= "AND db NOT IN ('";
>   $and_clause .=join("','",split(/,/,$ignore_dbs));
>   $and_clause .= "')";
> }
Comment 6 Sheeri Cabral [:sheeri] 2012-09-07 08:11:55 PDT
Created attachment 659263 [details]
updated version, can take a list of dbs to igore
Comment 7 Greg Cox [:gcox] 2012-09-07 09:45:54 PDT
All I have is nitpicking  (Maybe better help on the syntax of the -b option.  Maybe a FIXME comment for the future that the input isn't injection-sanitized).

I say "Ship it."
Comment 8 Sheeri Cabral [:sheeri] 2012-09-07 10:46:23 PDT
Created attachment 659296 [details]
fixed UNKNOWN to be status 3, added better help

Help now says:
   --ignore-db,-b Ignore these databases (comma separated list)

And I changed 
use constant UNKNOWN  => 2;
use constant UNKNOWN  => 3;
Comment 9 Sheeri Cabral [:sheeri] 2012-09-07 11:00:16 PDT
Please put the attached check into Nagios:

It should be run with the following options: --user nagiosdaemon --password **ELIDED** -T percona.checksums -I 24 -H $HOSTNAME$ -b mysql,INFORMATION_SCHEMA,PERFORMANCE_SCHEMA

At first this should be run on against the production phoenix bugzilla servers:

I would make a group (service group?) called mysql-checksum to put this in, because we're going to be adding more machines in the future.

This check should e-mail infra-dbnotices, but NOT page. It can be run every few hours; there's no need to run it every 5 minutes.

(please don't add this check on a Friday)
Comment 10 Sheeri Cabral [:sheeri] 2012-09-11 09:57:13 PDT
Upping the importance, this is the last step in a q3 goal for the DB team.
Comment 11 Ashish Vijayaram [:ashish] 2012-09-12 08:42:27 PDT
Added these to Nagios:
Comment 12 Sheeri Cabral [:sheeri] 2012-09-12 08:53:52 PDT
Please make sure that:


is in the check (this should be configurable, but probably won't change a ton)


This check should e-mail infra-dbnotices, but NOT page. It can be run every few hours; there's no need to run it every 5 minutes.
Comment 13 Sheeri Cabral [:sheeri] 2012-09-12 10:06:23 PDT
all better, thanx for fixing ashish!

Note You need to log in before you can comment on or make changes to this bug.