Closed
Bug 793696
Opened 13 years ago
Closed 13 years ago
Put checksumming scripts in puppet
Categories
(Data & BI Services Team :: DB: MySQL, task)
Data & BI Services Team
DB: MySQL
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Assigned: scabral)
References
Details
Bug 774162 got finished without *gasp* putting the script in puppet. With an eye toward bug 90749 and its brethren, we should get that script installed into the mysql2 module.
Assignee | ||
Comment 1•13 years ago
|
||
and it should actually be put in a cron for the bugzilla servers while we're at it.
Assignee: server-ops-database → scabral
Reporter | ||
Comment 2•13 years ago
|
||
Reporter | ||
Comment 3•13 years ago
|
||
OK, I dropped this in puppet for tp-bugs01-slave01 and puppetagain1. Let's see what happens. Still to do:
* generate a better report format for discrepancies
* turn this into a nagios check
* figure out performance knobs required for other clusters
* run this everywhere
Reporter | ||
Comment 4•13 years ago
|
||
Heh, the crontab has some bugs, but luckily 2>&1 >/dev/null makes those invisible :)
Reporter | ||
Comment 5•13 years ago
|
||
OK, bugs fixed. Checksums are now running on
bugzilla in phx1
puppetagain in scl3
dev in scl3
Reporter | ||
Comment 6•13 years ago
|
||
sorry for the noise on dbnotices over the weekend. I introduced some whitespace errors :(
Reporter | ||
Comment 7•13 years ago
|
||
I added the script to what will soon be the devtools cluster, for bug 796936.
Reporter | ||
Comment 8•13 years ago
|
||
An old version of percona-toolkit, 2.0.4, is installed on dev1 (and was on a few others). Fixed with yum upgrade.
And we're still seeing
---
Cannot connect to h=10.22.70.29,p=...,u=checksum
---
in the emails even when no checksum errors occur. I'll need to figure out how to filter that out, either by filtering the output, or indicating that the backup hosts should not be checked.
Assignee | ||
Comment 9•13 years ago
|
||
We can ignore those errors. Since the nagios check checks for freshness, do we really need the output of the script e-mailed to us, as opposed to logged to a file on the master? (so if anything goes wrong we can look at the file).
Reporter | ||
Comment 10•13 years ago
|
||
There's no nagios check yet, so I'm pretty sure it's not checking for freshness :)
Assignee | ||
Comment 11•13 years ago
|
||
There's a nagios check on bugzilla, e.g.:
https://nagios.mozilla.org/phx1/cgi-bin/extinfo.cgi?type=2&host=tp-bugs01-slave02.phx.mozilla.com&service=MySQL+Table+Checksum
Reporter | ||
Comment 12•13 years ago
|
||
Oh hey, check that out. I had no idea.
I enabled that for the other clusters. The four clusters running this are:
- bugzilla
- dev in scl3
- puppetagain in scl3
- b1 (devtools) in scl3
I also fixed the "Cannot reach.." mails. Hopefully we won't see any email unless the checksum fails; the nagios alert will then make sure we attend to such emails.
How many clusters do we want to put this on? Should we do all but a few high-load clusters, or just add clusters as desired?
Assignee | ||
Comment 13•13 years ago
|
||
We don't need email at all; if the checksum fails, the nagios check will page us.
Are you putting the check on the machine, or also adding a cron script? We have q4 goals to put the check + cron + nagios on sumo and addons (production), so for now I'm OK with putting the script itself on any machine that has the percona-toolkit package. The cron should be limited, of course, and don't put it on sumo or addons until we test it....but if the script is on the machine, it's MUCH easier to test.
We should also put the checksum user on the machines, with the checksum password, using grants. We should probably change the checksum password, and also put the grants so they're allowed from any 10.8.70.% and 10.22.70.%.
Reporter | ||
Comment 14•13 years ago
|
||
All of that happens with puppet. I can add a cron => false option to make it not run the cronscript, so it will install the rest.
Let's keep the mails, too, in case some other error conditions come up that aren't captured by the nagios check.
Reporter | ||
Comment 15•13 years ago
|
||
OK, script is in without the crontask for addons.
notice: /Stage[main]/Mysql2::Checksums/File[/usr/local/bin/mysql-checksum.sh]/ensure: defined content as '{md5}21e608f767c02879895ed50b04fc5435'
notice: /Stage[main]/Mysql2::Checksums/Mysql2::Grant[checksum-percona-db]/Exec[mysql2::grant::checksum-percona-db]/returns: executed successfully
notice: /Stage[main]/Mysql2::Checksums/Mysql2::Grant[checksum-percona-checksum-table]/Exec[mysql2::grant::checksum-percona-checksum-table]/returns: executed successfully
notice: /Stage[main]/Mysql2::Checksums/Mysql2::Grant[checksum-everywhere]/Exec[mysql2::grant::checksum-everywhere]/returns: executed successfully
Sumo's blocked on being moved to mysql2 - bug 717638, which is blocked on upgrading - bug 785987. But it's easy enough to add the script once that's done, so this bug is finished.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Data & BI Services Team
You need to log in
before you can comment on or make changes to this bug.
Description
•