Closed Bug 717638 Opened 13 years ago Closed 13 years ago

puppet project for mysql

Categories

(Data & BI Services Team :: DB: MySQL, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: scabral, Assigned: scabral)

References

()

Details

Attachments

(1 file)

We have 3 different puppet repos for mysql, we should combine them/rewrite with templates.
mysql2 is well underway and even has percona 5.1 and 5.5 forks. We will work on merging the other ones, perhaps by the end of 2012.
Yes! one of the bigger projects is to move the addons servers (stage and production in phx, that's all we have right now) to the mysql2 puppet directory. When you're done with addons5 in bug 783930 you can work on moving that one over. The most important issue is making sure the /etc/my.cnf doesn't change for the worse, which is the first step I do, and then after that, I make sure any scripts/crons that need to be copied, are, which involves going through the module files to see what it's copying/doing. (e.g. binlog backup scripts come to mind) One of the issues will be the .ssh directory for the "mysql" user in /var/lib/mysql - I'd rather not have that be on *all* the mysql2 machines (because all directories in the datadir show up as databases, so .ssh looks like a database called "#mysql50#.ssh" For addons specifically we'll have to add in new variables for auto_increment_increment and auto_increment_offset, but that's very similar to the variable that already exists for innodb_buffer_pool_size. That's all I can think of off the top of my head. I'm sure you'll find more.
Assignee: scabral → dustin
addons5 is already out of the RO pool in the lb, so you can work on the puppet module stuff for it whenever you're ready.
Here's an audit of the db servers I know about, based on https://mana.mozilla.org/wiki/display/SYSADMIN/Databases and the puppet manifests: > CLUSTER HOST PUPPET NOTES > A01 tp-a01-master01.phx db-mysql > A01 tp-a01-slave01.phx db-mysql > A01 tp-a01-slave02.phx db-mysql > AMO addons1.db.phx1 mysql::cluster > AMO addons2.db.phx1 mysql::cluster > AMO addons3.db.phx1 mysql::cluster > AMO addons4.db.phx1 mysql::cluster > AMO addons5.db.phx1 mysql::cluster > AMO addons6.db.phx1 mysql::cluster > AMO addons7.db.phx1 mysql_rhel6::master > AMO addons8.db.phx1 mysql::cluster > AMO addons9.db.phx1 mysql::cluster > AMO staging addons1.stage.db.phx1 mysql_rhel6::master not in mana > AMO staging addons2.stage.db.phx1 mysql::cluster not in mana > AMO staging addons3.stage.db.phx1 mysql::cluster not in mana > AMO staging addons4.stage.db.phx1 mysql::cluster not in mana > AMO staging addons5.stage.db.phx1 mysql::cluster not in mana > B1 b1-db1.db.scl3 mysql2::server > B1 b1-db2.db.scl3 mysql2::server > B2 b2-db1.db.scl3 - still being set up > B2 b2-db2.db.scl3 - still being set up > Bedrock bedrock1.db.scl3 mysql2::server > Bedrock bedrock2.db.scl3 mysql2::server > Bouncer tp-bouncer01-master01.phx db-mysql > Bouncer tp-bouncer01-slave01.phx db-mysql > Bouncer tp-bouncer01-slave02.phx db-mysql > Bouncer tp-bouncer01-slave03.phx db-mysql > Bugzilla tp-bugs01-master01.phx db-mysql > Bugzilla tp-bugs01-slave01.phx mysql2::server > Bugzilla tp-bugs01-slave02.phx mysql2::server > Bugzilla tp-bugs01-slave03.phx mysql2::server > Buildbot buildbot1.db.scl3 mysql2::server > Buildbot buildbot2.db.scl3 mysql2::server > C01 tp-c01-master01.phx db-mysql > C01 tp-c01-slave01.phx db-mysql > dev.phx dev1.db.phx1 - > dev.phx dev2.db.phx1 - > dev.scl3 dev1.db.scl3 mysql2::server > dev.scl3 dev2.db.scl3 mysql2::server > developer developer1.db.scl3 mysql2::server > developer developer2.db.scl3 mysql2::server > engagement engagement1.db.phx1 mysql_rhel6::slave > engagement engagement2.db.phx1 mysql_rhel6::slave > engagement-dev node36.seamicro.phx1 mysql2::server > generic generic1.db.phx1 mysql2::server > generic generic2.db.phx1 mysql2::server > generic generic1.db.scl3 mysql2::server not in mana > generic generic2.db.scl3 mysql2::server not in mana > intranet intranet1.db.phx1 mysql2::server > intranet intranet2.db.phx1 mysql2::server > intranet intranet1.stage.db.phx1 mysql2::server plus mysql2::grant, mysql2::database > intranet intranet2.stage.db.phx1 mysql2::server plus mysql2::grant, mysql2::database > metrics mysql1.metrics.scl3 mysql2::server > metrics mysql2.metrics.scl3 mysql2::server > personas getpersonas1.db.scl3 mysql2::server > personas getpersonas2.db.scl3 mysql2::server > sumo support1.db.phx1 db-mysql > sumo support2.db.phx1 db-mysql > sumo support3.db.phx1 db-mysql > sumo support4.db.phx1 db-mysql > sfx01 tp-sfx01-master01 db-mysql > sfx01 tp-sfx01-slave01 db-mysql > staging stage1.db.scl3 - > staging stage2.db.scl3 - > webdev webdev1.db.scl3 none in puppet, but no mysql config > webdev webdev2.db.scl3 none in puppet, but no mysql config > backup backup1.db.scl3 mysql2::backups > backup backup2.db.scl3 mysql2::backups So it looks like the plan is to go to mysql2::server for everything, and I should try it out on addons5 while I have the chance.
So, here's that same list, taking out the machines that are already on mysql2, or are not valid machines (which I've since taken out of puppet). In other words, this is the to-do list: > CLUSTER HOST PUPPET NOTES > A01 tp-a01-master01.phx db-mysql > A01 tp-a01-slave01.phx db-mysql > A01 tp-a01-slave02.phx db-mysql > AMO addons1.db.phx1 mysql::cluster > AMO addons2.db.phx1 mysql::cluster > AMO addons3.db.phx1 mysql::cluster > AMO addons4.db.phx1 mysql::cluster > AMO addons5.db.phx1 mysql::cluster > AMO addons6.db.phx1 mysql::cluster > AMO addons7.db.phx1 mysql_rhel6::master > AMO staging addons1.stage.db.phx1 mysql_rhel6::master not in mana > AMO staging addons2.stage.db.phx1 mysql::cluster not in mana > AMO staging addons3.stage.db.phx1 mysql::cluster not in mana > AMO staging addons4.stage.db.phx1 mysql::cluster not in mana > AMO staging addons5.stage.db.phx1 mysql::cluster not in mana > B2 b2-db1.db.scl3 - still being set up > B2 b2-db2.db.scl3 - still being set up > Bouncer tp-bouncer01-master01.phx db-mysql > Bouncer tp-bouncer01-slave01.phx db-mysql > Bouncer tp-bouncer01-slave02.phx db-mysql > Bouncer tp-bouncer01-slave03.phx db-mysql > Bugzilla tp-bugs01-master01.phx db-mysql > C01 tp-c01-master01.phx db-mysql > C01 tp-c01-slave01.phx db-mysql > dev.phx dev1.db.phx1 - > dev.phx dev2.db.phx1 - > engagement engagement1.db.phx1 mysql_rhel6::slave > engagement engagement2.db.phx1 mysql_rhel6::slave > sumo support1.db.phx1 db-mysql > sumo support2.db.phx1 db-mysql > sumo support3.db.phx1 db-mysql > sumo support4.db.phx1 db-mysql > sfx01 tp-sfx01-master01 db-mysql > sfx01 tp-sfx01-slave01 db-mysql > staging stage1.db.scl3 - > staging stage2.db.scl3 - > webdev webdev1.db.scl3 none in puppet, but no mysql config > webdev webdev2.db.scl3 none in puppet, but no mysql config
addons5.db.phx1 is moved. For my notes: Config differences were: values differ: - key_buffer_size: puppet=512M, host=4G - query_cache_size: puppet=0, host=256M only in puppet: - log_slave_updates - character_set_server=utf8 - default_storage_engine=InnoDB - ft_min_word_len=2 - query_cache_type=0 only on host: - low-priority-updates - innodb_data_home_dir = - innodb_data_file_path = /var/lib/mysql-innodb/innodb.db:10M:autoextend - innodb_log_group_home_dir = /var/lib/mysql-innodb - innodb_file_io_threads=4 - [mysql.server]basedir=/var/lib so I moved /var/lib/mysql-innodb/* to /var/lib/mysql, and then moved innodb.db to ibdata1
Next up: the remaining addons*.db slaves, one at a time, skirting carefully around addons4 which is disabled.
addons1.db.phx1 done
addons4 is enabled again. updated list: DONE addons1.db.phx1 addons5.db.phx1 NOT DONE addons2.db.phx1 addons3.db.phx1 addons4.db.phx1 addons6.db.phx1 addons7.db.phx1 - current master, so tread carefully on this one
addons2.db.phx1 is converted to mysql2::server and back in the pool
DONE addons1.db.phx1 addons2.db.phx1 addons5.db.phx1 NOT DONE addons3.db.phx1 addons4.db.phx1 addons6.db.phx1 addons7.db.phx1 - current master, so tread carefully on this one
addons3.db.phx1 done
addons4.db.phx1 done, and tracking is now in the etherpad, because these bug comments are going to get boring.
Heh, good point about the boring comments. Are you also tracking the scripts and such? I know there is a "copy binary logs" script that gets placed on the filesystem and scheduled through cron via puppet, can we make sure that also happens in the mysql2 module? (basically go through everything that happens in the old module and ensure it also happens in the new one...except the .ssh directory, as I mention in comment 3).
Gotcha - I'll check through and re-add, with your irc r?
Attached patch bug717638.patch — — Splinter Review
I'll need to be careful applying this, mapping the slowlogs settings appropriately for everything that's using mysql2::server already. If I set slowlogs => true but don't set slowlogs_backupdest, then the logs will only be rotated, but not copied anywhere. Is this an OK way to handle any systems that are currently writing slow logs, but not the crontask? (I don't know if there are such systems, but I'll look)
Comment on attachment 655100 [details] [diff] [review] bug717638.patch gave some patch comments in IRC - looks good overall!
Comment on attachment 655100 [details] [diff] [review] bug717638.patch Per sheeri, I should remove the ulimit stuff - mysqld is run from root so it should not have those limits. Other than that, test and go.
OK, that's landed and seems safe enough. All of the addons slaves are done, and I'll need some guidance on how to do the master. I'll get started on the staging nodes in the interim.
How much can you do without restarting MySQL? (iirc you only need to restart for confirmation that things aren't going to pot). Can you try on addons1.stage first - do the migration, but don't do the sanity checking restart?
(when you do addons1.stage, please make sure it has binlog_format=>"MIXED" in the manifest. the other addons stage servers have it in there).
When bouncer is done, we'll need to set max_connections=4800 on those servers. (should that be a separate bug?)
Well, so far I've also been moving the data to the appropriate directory, which *does* require a restart. And yes, that should be a separate bug.
(In reply to Sheeri Cabral [:sheeri] from comment #22) > (when you do addons1.stage, please make sure it has binlog_format=>"MIXED" > in the manifest. the other addons stage servers have it in there). This is part of the my.cnf installed by mysql2::server, so no problem.
addons slaves are done. I'll do a practice failover of the addons.stage master tomorrow morning before 9am us/eastern. In the interim, I'll continue working on the slaves in the 2-server m/s clusters.
At least the stage*.db.scl3 servers listed above aren't getting *any* mysql config from puppet -- they are left with config that was written back in april, when they were named tm-db-whatever. So, that's no fun! I've disabled puppet on one of them, and will be running it with --noop until I'm satisfied the results are safe to run for real.
Here are the my.cnf diffs for stage2.db.scl3: * server-id changes * old has slave-skip-errors=1062 * old has skip-locking * log-bin, relay-log, and relay-log index have old hostname in filename * new has skip-external-locking * new has default_storage_engine=InnoDB (probably ok?) * new has ft_stopword_file= (means default?) * -key_buffer=120M -> +key_buffer_size=512M * +join_buffer_size=8M * +net_buffer_length=32K * +preload_buffer_size=2M * wait_timeout 120 -> 600 * +table_open_cache=3072 * +read_buffer_size=8M * -log-queries-not-using-indexes * -innodb_buffer_pool_size=2160M -> 1536M * -innodb_data_home_dir = * -innodb_data_file_path = /var/lib/mysql-innodb/innodb.db:1000M:autoextend * -innodb_log_group_home_dir = /var/lib/mysql-innodb * -innodb_log_arch_dir = /var/lib/mysql-innodb * -innodb_file_io_threads=4 * -innodb_log_archive=0 * +innodb_file_per_table * +innodb_flush_method=O_DIRECT * -sort_buffer=6M -> 256K I'll see if I can look up some of these and take a guess, but sheeri, please flag anything that sets off alarm bells for you in https://etherpad.mozilla.org/Yh0d5ymWiN in case I miss it.
In reply to comment 28, I have updated the etherpad.
I'm going to hold off on stage in light of bug 789035. Sheeri has added some good notes in the etherpad when the time comes to move it to mysql2::server.
I updated the dependencies and the etherpad to reflect the clusters where MySQL itself (rather than the Puppet module, as here) is being upgraded. We should roll in the puppet upgrade with the MySQL upgrade. That leaves available for work now: > Bugzilla tp-bugs01-master01.phx db-mysql > dev.phx dev1.db.phx1 - > dev.phx dev2.db.phx1 - > engagement engagement1.db.phx1 mysql_rhel6::slave > engagement engagement2.db.phx1 mysql_rhel6::slave > webdev webdev1.db.scl3 none in puppet, but no mysql config > webdev webdev2.db.scl3 none in puppet, but no mysql config
Looks like sheeri knocked out dev.phx two days ago.
oops, yeah, forgot to let you know. :D I was upgrading them. Note that tp-bugs01-master01 is currently a slave, so it can be taken out of the load balancer and changed whenever needed.
Looks like tp-bugs01-master01 already got fixed, too: e531b4f4 (scabral@mozilla.com 2012-08-23 17:55:43 +0000 643) "mysql2::server": As for the engagement cluster, a diff of my.cnf shows only a few questions: * mysql2::server will remove 'low-priority-updates'. Do we need an option for that, or can it be changed? * mysql2::server will change innodb_flush_log_at_trx_commit from 0 to 2. From my read, these are roughly equivalent, with 2 being a bit more resilient to failure. Is this an OK change?
Both those are OK changes to make (and ones I've made on the other clusters, they're things that were put in by default back when they were good ideas).
OK, I'll start the upgrades on the engagement slave shortly. We'll need a planned downtime for the master, I think.
The remainder is stalled on needing a planned downtime: - upgrading most clusters to 5.1 or higher (bug 790300) - switching masters to mysql2::server
All masters should have failovers, so if engagement isn't set up in that configuration, we should talk. But there shouldn't need to be a scheduled downtime, we should be able to switch over to the slave for writes during the maintenance/upgrade.
This is happening, slowly but surely: > A01 tp-a01-master01.phx db-mysql > A01 tp-a01-slave01.phx db-mysql > A01 tp-a01-slave02.phx db-mysql - needs to be upgraded first - https://bugzilla.mozilla.org/show_bug.cgi?id=780954 > B2 b2-db2.db.scl3 - Hopefully we can reinstall with rhel6 - rhel5 has too many package conflicts > datazilla1.db.scl3.mozilla.com > datazilla2.db.scl3.mozilla.com - ??? > dp-geodns01.phx.mozilla.com > geodns1.vips.scl3.mozilla.com - ??? (and yes. geodns1.vips is a VM, not a VIP) > tp-b02-master01.phx.mozilla.com > C01 tp-c01-master01.phx db-mysql > C01 tp-c01-slave01.phx db-mysql - needs to be upgraded or decomm'd - https://bugzilla.mozilla.org/show_bug.cgi?id=790413 > sumo support1.db.phx1 db-mysql > sumo support2.db.phx1 db-mysql > sumo support3.db.phx1 db-mysql > sumo support4.db.phx1 db-mysql - needs to be upgraded - https://bugzilla.mozilla.org/show_bug.cgi?id=785987 > webdev webdev1.db.scl3 none in puppet, but no mysql config > webdev webdev2.db.scl3 none in puppet, but no mysql config - needs to be upgraded - https://bugzilla.mozilla.org/show_bug.cgi?id=791316 - needs to have the mysql2 class added - bug 801473 ** note that there is now a mysql2-puppetized-server nagios hostgroup, so as nodes are added to puppet, they should be added to that hostgroup too.
tp-b02-master01.phx.mozilla.com was put in earlier this week.
support3 was removed from db-mysql and put into mysql2 today.
I think all the support and a01 machines are done, and c01 is turned off. can I get an updated list? (if c01 is in there, let me know, I can delete them from puppet).
Looks like the remainder is: > datazilla1.db.scl3.mozilla.com > dp-geodns01.phx.mozilla.com > geodns1.vips.scl3.mozilla.com and really, this shouldn't be assigned to me :(
Assignee: dustin → scabral
These are scheduled to go away as per https://bugzilla.mozilla.org/show_bug.cgi?id=796161 > dp-geodns01.phx.mozilla.com > geodns1.vips.scl3.mozilla.com And as for datazilla1, well, datazilla2 was already in there, so I added datazilla1. I'm going to resolve this one.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: