New script for Nagios checking database checksums


Status Graveyard
Server Operations
5 years ago
3 years ago


(Reporter: sheeri, Assigned: ashish)




(1 attachment, 2 obsolete attachments)



5 years ago
Created attachment 658880 [details]
updated check_table_checksums

I have modified the code at:

To work with updated versions of pt-table-checksum, and to ignore system databases. Specifically, I have:

- taken "host" out of the SELECT statements for the checksum, as host is not captured by default from pt-table-checksum

- added an option, --ignore-sys or -b, to ignore system databases (hard-coded as mysql, INFORMATION_SCHEMA, PERFORMANCE_SCHEMA).
  *** Ideally this would be changed to options like --ignore-dbs and --ignore-tbls, which would have comma-separated lists of dbs to ignore and tables to ignore, but that's a 'nice-to-have' and perhaps PalominoDB will do it upstream after we send these patches back to them. (but if anyone has time, I'm happy to have those features, in which case --ignore-sys becomes --ignore-dbs mysql,INFORMATION_SCHEMA,PERFORMANCE_SCHEMA

If you just want to see the diff, you can download the original script at to compare.

Comment 1

5 years ago
tl;dr: Looks fine to me, nothing glaring, and I'm not going to nitpick style.

Line 13 says unknown returns 2.  It should return 3, but that's an upstream bug.

Line numbers refer to the upgraded script.
Nitpicking: Line 144 and 158 print your and_clause variable, which could make this chattier than necessary, particularly on 144 where it's in a newline.  Since it's a nagios check, you might not ever see it, so if it's important, you might need to shuffle the failure case printout around.

If you wanted to do ignore dbs as a customizable variable, the secret sauce is up around line 38, where they say vArNaMe=s, meaning varname expects a 's'tring associated with it.  That's easy.  The tougher part, of course, is to let Little Bobby Tables check the inputs.  They don't seem to be doing that anyway on the --table variable, so, maybe they're considering it skippable.

Comment 2

5 years ago
Thanx about Line 13 :D um, 2 = critical right?

As for the chattiness, the idea is to make sure when people see "OK" or "WARNING/CRITICAL" they know that the check didn't include those DBs. So OK might be "OK but we didn't check x,y,z"

Is there a magic letter for a comma-separated list and an easy way to go through each one to quote it? e.g.

--ignore-dbs a,b,c 

needs to turn into

AND db not in ('a','b','c')

So that's the hard part IMO.

I'm not worried about SQL injection, because this is a nagios check, and if you can screw around with our nagios, it's over anyway.

Comment 3

5 years ago
'2' is critical.

OK, ignore my chattiness concern.  Hadn't realized the one-line-only limit was lifted.

As to option processing for DB stuff, stealing from :

gcox@fibbsbozza:~$ ./ 
gcox@fibbsbozza:~$ ./ --ignore-db foo --ignore-db bar --ignore-db baz
 AND db not in ('foo','bar','baz')
gcox@fibbsbozza:~$ ./ --ignore-db foo,bar --ignore-db baz
 AND db not in ('foo','bar','baz')
gcox@fibbsbozza:~$ ./ --ignore-db foo,bar,baz
 AND db not in ('foo','bar','baz')

gcox@fibbsbozza:~$ cat 
#!/usr/bin/perl -w
use strict;
use Getopt::Long;

my @ignore_dbs = ();
GetOptions ("ignore-dbs=s" => \@ignore_dbs);
@ignore_dbs = split(/,/, join(',',@ignore_dbs));
my $and_clause = @ignore_dbs ? ' AND db not in ('.join(',',map {"'$_'"} @ignore_dbs).')' : '';

print $and_clause."\n" if ($and_clause);

Comment 4

5 years ago
Well, that's good. It's a bit more work, but it's good work for tomorrow's "no change Friday"

Comment 5

5 years ago
OK, I took out the ignore-sys and put in an ignore-db option. Anything glaringly wrong here? I'm attaching the new file, and here's the diff from the previous version.

[root@tp-bugs01-master01 bin]# diff 
< my $ignore_sys = 0;
< my $ignore_dbs="'mysql','INFORMATION_SCHEMA','PERFORMANCE_SCHEMA'";
> my $ignore_dbs='';
<   'ignore-sys|b' => \$ignore_sys,
>   'ignore-db|b=s' => \$ignore_dbs,
<   --ignore-sys,-b Ignore system databases/tables like mysql, INFORMATION_SCHEMA, etc. 
>   --ignore-db,-b Ignore databases 
< $and_clause = $ignore_sys ? " AND db not in ($ignore_dbs) " : "" ;
< #print "$ignore_sys is ignore-sys\n $and_clause = and clause";
> if ($ignore_dbs) {
>   $and_clause .= "AND db NOT IN ('";
>   $and_clause .=join("','",split(/,/,$ignore_dbs));
>   $and_clause .= "')";
> }

Comment 6

5 years ago
Created attachment 659263 [details]
updated version, can take a list of dbs to igore

Comment 7

5 years ago
All I have is nitpicking  (Maybe better help on the syntax of the -b option.  Maybe a FIXME comment for the future that the input isn't injection-sanitized).

I say "Ship it."

Comment 8

5 years ago
Created attachment 659296 [details]
fixed UNKNOWN to be status 3, added better help

Help now says:
   --ignore-db,-b Ignore these databases (comma separated list)

And I changed 
use constant UNKNOWN  => 2;
use constant UNKNOWN  => 3;
Attachment #658880 - Attachment is obsolete: true
Attachment #659263 - Attachment is obsolete: true

Comment 9

5 years ago
Please put the attached check into Nagios:

It should be run with the following options: --user nagiosdaemon --password **ELIDED** -T percona.checksums -I 24 -H $HOSTNAME$ -b mysql,INFORMATION_SCHEMA,PERFORMANCE_SCHEMA

At first this should be run on against the production phoenix bugzilla servers:

I would make a group (service group?) called mysql-checksum to put this in, because we're going to be adding more machines in the future.

This check should e-mail infra-dbnotices, but NOT page. It can be run every few hours; there's no need to run it every 5 minutes.

(please don't add this check on a Friday)
Summary: Verify perl script to check checksums on slaves in Nagios → New script for Nagios checking database checksums

Comment 10

5 years ago
Upping the importance, this is the last step in a q3 goal for the DB team.
Severity: normal → major


5 years ago
Assignee: server-ops → ashish

Comment 11

5 years ago
Added these to Nagios:
Last Resolved: 5 years ago
Resolution: --- → FIXED

Comment 12

5 years ago
Please make sure that:


is in the check (this should be configurable, but probably won't change a ton)


This check should e-mail infra-dbnotices, but NOT page. It can be run every few hours; there's no need to run it every 5 minutes.
Resolution: FIXED → ---

Comment 13

5 years ago
all better, thanx for fixing ashish!
Last Resolved: 5 years ago5 years ago
Resolution: --- → FIXED
Product: → Graveyard
You need to log in before you can comment on or make changes to this bug.