Closed Bug 793696 Opened 13 years ago Closed 13 years ago

Put checksumming scripts in puppet

Categories

(Data & BI Services Team :: DB: MySQL, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: scabral)

References

Details

Bug 774162 got finished without *gasp* putting the script in puppet. With an eye toward bug 90749 and its brethren, we should get that script installed into the mysql2 module.
and it should actually be put in a cron for the bugzilla servers while we're at it.
Assignee: server-ops-database → scabral
OK, I dropped this in puppet for tp-bugs01-slave01 and puppetagain1. Let's see what happens. Still to do: * generate a better report format for discrepancies * turn this into a nagios check * figure out performance knobs required for other clusters * run this everywhere
Heh, the crontab has some bugs, but luckily 2>&1 >/dev/null makes those invisible :)
OK, bugs fixed. Checksums are now running on bugzilla in phx1 puppetagain in scl3 dev in scl3
sorry for the noise on dbnotices over the weekend. I introduced some whitespace errors :(
I added the script to what will soon be the devtools cluster, for bug 796936.
An old version of percona-toolkit, 2.0.4, is installed on dev1 (and was on a few others). Fixed with yum upgrade. And we're still seeing --- Cannot connect to h=10.22.70.29,p=...,u=checksum --- in the emails even when no checksum errors occur. I'll need to figure out how to filter that out, either by filtering the output, or indicating that the backup hosts should not be checked.
We can ignore those errors. Since the nagios check checks for freshness, do we really need the output of the script e-mailed to us, as opposed to logged to a file on the master? (so if anything goes wrong we can look at the file).
There's no nagios check yet, so I'm pretty sure it's not checking for freshness :)
Oh hey, check that out. I had no idea. I enabled that for the other clusters. The four clusters running this are: - bugzilla - dev in scl3 - puppetagain in scl3 - b1 (devtools) in scl3 I also fixed the "Cannot reach.." mails. Hopefully we won't see any email unless the checksum fails; the nagios alert will then make sure we attend to such emails. How many clusters do we want to put this on? Should we do all but a few high-load clusters, or just add clusters as desired?
We don't need email at all; if the checksum fails, the nagios check will page us. Are you putting the check on the machine, or also adding a cron script? We have q4 goals to put the check + cron + nagios on sumo and addons (production), so for now I'm OK with putting the script itself on any machine that has the percona-toolkit package. The cron should be limited, of course, and don't put it on sumo or addons until we test it....but if the script is on the machine, it's MUCH easier to test. We should also put the checksum user on the machines, with the checksum password, using grants. We should probably change the checksum password, and also put the grants so they're allowed from any 10.8.70.% and 10.22.70.%.
All of that happens with puppet. I can add a cron => false option to make it not run the cronscript, so it will install the rest. Let's keep the mails, too, in case some other error conditions come up that aren't captured by the nagios check.
OK, script is in without the crontask for addons. notice: /Stage[main]/Mysql2::Checksums/File[/usr/local/bin/mysql-checksum.sh]/ensure: defined content as '{md5}21e608f767c02879895ed50b04fc5435' notice: /Stage[main]/Mysql2::Checksums/Mysql2::Grant[checksum-percona-db]/Exec[mysql2::grant::checksum-percona-db]/returns: executed successfully notice: /Stage[main]/Mysql2::Checksums/Mysql2::Grant[checksum-percona-checksum-table]/Exec[mysql2::grant::checksum-percona-checksum-table]/returns: executed successfully notice: /Stage[main]/Mysql2::Checksums/Mysql2::Grant[checksum-everywhere]/Exec[mysql2::grant::checksum-everywhere]/returns: executed successfully Sumo's blocked on being moved to mysql2 - bug 717638, which is blocked on upgrading - bug 785987. But it's easy enough to add the script once that's done, so this bug is finished.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.