Closed
Bug 717638
Opened 13 years ago
Closed 13 years ago
puppet project for mysql
Categories
(Data & BI Services Team :: DB: MySQL, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: scabral, Assigned: scabral)
References
()
Details
Attachments
(1 file)
3.14 KB,
patch
|
Details | Diff | Splinter Review |
We have 3 different puppet repos for mysql, we should combine them/rewrite with templates.
Assignee | ||
Comment 1•13 years ago
|
||
mysql2 is well underway and even has percona 5.1 and 5.5 forks. We will work on merging the other ones, perhaps by the end of 2012.
Comment 2•13 years ago
|
||
Can I help?
Assignee | ||
Comment 3•13 years ago
|
||
Yes! one of the bigger projects is to move the addons servers (stage and production in phx, that's all we have right now) to the mysql2 puppet directory. When you're done with addons5 in bug 783930 you can work on moving that one over.
The most important issue is making sure the /etc/my.cnf doesn't change for the worse, which is the first step I do, and then after that, I make sure any scripts/crons that need to be copied, are, which involves going through the module files to see what it's copying/doing. (e.g. binlog backup scripts come to mind)
One of the issues will be the .ssh directory for the "mysql" user in /var/lib/mysql - I'd rather not have that be on *all* the mysql2 machines (because all directories in the datadir show up as databases, so .ssh looks like a database called "#mysql50#.ssh"
For addons specifically we'll have to add in new variables for auto_increment_increment and auto_increment_offset, but that's very similar to the variable that already exists for innodb_buffer_pool_size.
That's all I can think of off the top of my head. I'm sure you'll find more.
Assignee | ||
Updated•13 years ago
|
Assignee: scabral → dustin
Assignee | ||
Comment 4•13 years ago
|
||
addons5 is already out of the RO pool in the lb, so you can work on the puppet module stuff for it whenever you're ready.
Comment 5•13 years ago
|
||
Here's an audit of the db servers I know about, based on https://mana.mozilla.org/wiki/display/SYSADMIN/Databases and the puppet manifests:
> CLUSTER HOST PUPPET NOTES
> A01 tp-a01-master01.phx db-mysql
> A01 tp-a01-slave01.phx db-mysql
> A01 tp-a01-slave02.phx db-mysql
> AMO addons1.db.phx1 mysql::cluster
> AMO addons2.db.phx1 mysql::cluster
> AMO addons3.db.phx1 mysql::cluster
> AMO addons4.db.phx1 mysql::cluster
> AMO addons5.db.phx1 mysql::cluster
> AMO addons6.db.phx1 mysql::cluster
> AMO addons7.db.phx1 mysql_rhel6::master
> AMO addons8.db.phx1 mysql::cluster
> AMO addons9.db.phx1 mysql::cluster
> AMO staging addons1.stage.db.phx1 mysql_rhel6::master not in mana
> AMO staging addons2.stage.db.phx1 mysql::cluster not in mana
> AMO staging addons3.stage.db.phx1 mysql::cluster not in mana
> AMO staging addons4.stage.db.phx1 mysql::cluster not in mana
> AMO staging addons5.stage.db.phx1 mysql::cluster not in mana
> B1 b1-db1.db.scl3 mysql2::server
> B1 b1-db2.db.scl3 mysql2::server
> B2 b2-db1.db.scl3 - still being set up
> B2 b2-db2.db.scl3 - still being set up
> Bedrock bedrock1.db.scl3 mysql2::server
> Bedrock bedrock2.db.scl3 mysql2::server
> Bouncer tp-bouncer01-master01.phx db-mysql
> Bouncer tp-bouncer01-slave01.phx db-mysql
> Bouncer tp-bouncer01-slave02.phx db-mysql
> Bouncer tp-bouncer01-slave03.phx db-mysql
> Bugzilla tp-bugs01-master01.phx db-mysql
> Bugzilla tp-bugs01-slave01.phx mysql2::server
> Bugzilla tp-bugs01-slave02.phx mysql2::server
> Bugzilla tp-bugs01-slave03.phx mysql2::server
> Buildbot buildbot1.db.scl3 mysql2::server
> Buildbot buildbot2.db.scl3 mysql2::server
> C01 tp-c01-master01.phx db-mysql
> C01 tp-c01-slave01.phx db-mysql
> dev.phx dev1.db.phx1 -
> dev.phx dev2.db.phx1 -
> dev.scl3 dev1.db.scl3 mysql2::server
> dev.scl3 dev2.db.scl3 mysql2::server
> developer developer1.db.scl3 mysql2::server
> developer developer2.db.scl3 mysql2::server
> engagement engagement1.db.phx1 mysql_rhel6::slave
> engagement engagement2.db.phx1 mysql_rhel6::slave
> engagement-dev node36.seamicro.phx1 mysql2::server
> generic generic1.db.phx1 mysql2::server
> generic generic2.db.phx1 mysql2::server
> generic generic1.db.scl3 mysql2::server not in mana
> generic generic2.db.scl3 mysql2::server not in mana
> intranet intranet1.db.phx1 mysql2::server
> intranet intranet2.db.phx1 mysql2::server
> intranet intranet1.stage.db.phx1 mysql2::server plus mysql2::grant, mysql2::database
> intranet intranet2.stage.db.phx1 mysql2::server plus mysql2::grant, mysql2::database
> metrics mysql1.metrics.scl3 mysql2::server
> metrics mysql2.metrics.scl3 mysql2::server
> personas getpersonas1.db.scl3 mysql2::server
> personas getpersonas2.db.scl3 mysql2::server
> sumo support1.db.phx1 db-mysql
> sumo support2.db.phx1 db-mysql
> sumo support3.db.phx1 db-mysql
> sumo support4.db.phx1 db-mysql
> sfx01 tp-sfx01-master01 db-mysql
> sfx01 tp-sfx01-slave01 db-mysql
> staging stage1.db.scl3 -
> staging stage2.db.scl3 -
> webdev webdev1.db.scl3 none in puppet, but no mysql config
> webdev webdev2.db.scl3 none in puppet, but no mysql config
> backup backup1.db.scl3 mysql2::backups
> backup backup2.db.scl3 mysql2::backups
So it looks like the plan is to go to mysql2::server for everything, and I should try it out on addons5 while I have the chance.
Assignee | ||
Comment 6•13 years ago
|
||
So, here's that same list, taking out the machines that are already on mysql2, or are not valid machines (which I've since taken out of puppet). In other words, this is the to-do list:
> CLUSTER HOST PUPPET NOTES
> A01 tp-a01-master01.phx db-mysql
> A01 tp-a01-slave01.phx db-mysql
> A01 tp-a01-slave02.phx db-mysql
> AMO addons1.db.phx1 mysql::cluster
> AMO addons2.db.phx1 mysql::cluster
> AMO addons3.db.phx1 mysql::cluster
> AMO addons4.db.phx1 mysql::cluster
> AMO addons5.db.phx1 mysql::cluster
> AMO addons6.db.phx1 mysql::cluster
> AMO addons7.db.phx1 mysql_rhel6::master
> AMO staging addons1.stage.db.phx1 mysql_rhel6::master not in mana
> AMO staging addons2.stage.db.phx1 mysql::cluster not in mana
> AMO staging addons3.stage.db.phx1 mysql::cluster not in mana
> AMO staging addons4.stage.db.phx1 mysql::cluster not in mana
> AMO staging addons5.stage.db.phx1 mysql::cluster not in mana
> B2 b2-db1.db.scl3 - still being set up
> B2 b2-db2.db.scl3 - still being set up
> Bouncer tp-bouncer01-master01.phx db-mysql
> Bouncer tp-bouncer01-slave01.phx db-mysql
> Bouncer tp-bouncer01-slave02.phx db-mysql
> Bouncer tp-bouncer01-slave03.phx db-mysql
> Bugzilla tp-bugs01-master01.phx db-mysql
> C01 tp-c01-master01.phx db-mysql
> C01 tp-c01-slave01.phx db-mysql
> dev.phx dev1.db.phx1 -
> dev.phx dev2.db.phx1 -
> engagement engagement1.db.phx1 mysql_rhel6::slave
> engagement engagement2.db.phx1 mysql_rhel6::slave
> sumo support1.db.phx1 db-mysql
> sumo support2.db.phx1 db-mysql
> sumo support3.db.phx1 db-mysql
> sumo support4.db.phx1 db-mysql
> sfx01 tp-sfx01-master01 db-mysql
> sfx01 tp-sfx01-slave01 db-mysql
> staging stage1.db.scl3 -
> staging stage2.db.scl3 -
> webdev webdev1.db.scl3 none in puppet, but no mysql config
> webdev webdev2.db.scl3 none in puppet, but no mysql config
Comment 7•13 years ago
|
||
addons5.db.phx1 is moved. For my notes:
Config differences were:
values differ:
- key_buffer_size: puppet=512M, host=4G
- query_cache_size: puppet=0, host=256M
only in puppet:
- log_slave_updates
- character_set_server=utf8
- default_storage_engine=InnoDB
- ft_min_word_len=2
- query_cache_type=0
only on host:
- low-priority-updates
- innodb_data_home_dir =
- innodb_data_file_path = /var/lib/mysql-innodb/innodb.db:10M:autoextend
- innodb_log_group_home_dir = /var/lib/mysql-innodb
- innodb_file_io_threads=4
- [mysql.server]basedir=/var/lib
so I moved /var/lib/mysql-innodb/* to /var/lib/mysql, and then moved innodb.db to ibdata1
Comment 8•13 years ago
|
||
Next up: the remaining addons*.db slaves, one at a time, skirting carefully around addons4 which is disabled.
Comment 9•13 years ago
|
||
addons1.db.phx1 done
Assignee | ||
Comment 10•13 years ago
|
||
addons4 is enabled again.
updated list:
DONE
addons1.db.phx1
addons5.db.phx1
NOT DONE
addons2.db.phx1
addons3.db.phx1
addons4.db.phx1
addons6.db.phx1
addons7.db.phx1 - current master, so tread carefully on this one
Comment 11•13 years ago
|
||
addons2.db.phx1 is converted to mysql2::server and back in the pool
Assignee | ||
Comment 12•13 years ago
|
||
DONE
addons1.db.phx1
addons2.db.phx1
addons5.db.phx1
NOT DONE
addons3.db.phx1
addons4.db.phx1
addons6.db.phx1
addons7.db.phx1 - current master, so tread carefully on this one
Comment 13•13 years ago
|
||
addons3.db.phx1 done
Comment 14•13 years ago
|
||
addons4.db.phx1 done, and tracking is now in the etherpad, because these bug comments are going to get boring.
Assignee | ||
Comment 15•13 years ago
|
||
Heh, good point about the boring comments.
Are you also tracking the scripts and such? I know there is a "copy binary logs" script that gets placed on the filesystem and scheduled through cron via puppet, can we make sure that also happens in the mysql2 module? (basically go through everything that happens in the old module and ensure it also happens in the new one...except the .ssh directory, as I mention in comment 3).
Comment 16•13 years ago
|
||
Gotcha - I'll check through and re-add, with your irc r?
Comment 17•13 years ago
|
||
I'll need to be careful applying this, mapping the slowlogs settings appropriately for everything that's using mysql2::server already.
If I set
slowlogs => true
but don't set slowlogs_backupdest, then the logs will only be rotated, but not copied anywhere. Is this an OK way to handle any systems that are currently writing slow logs, but not the crontask? (I don't know if there are such systems, but I'll look)
Assignee | ||
Comment 18•13 years ago
|
||
Comment on attachment 655100 [details] [diff] [review]
bug717638.patch
gave some patch comments in IRC - looks good overall!
Comment 19•13 years ago
|
||
Comment on attachment 655100 [details] [diff] [review]
bug717638.patch
Per sheeri, I should remove the ulimit stuff - mysqld is run from root so it should not have those limits. Other than that, test and go.
Comment 20•13 years ago
|
||
OK, that's landed and seems safe enough.
All of the addons slaves are done, and I'll need some guidance on how to do the master. I'll get started on the staging nodes in the interim.
Assignee | ||
Comment 21•13 years ago
|
||
How much can you do without restarting MySQL? (iirc you only need to restart for confirmation that things aren't going to pot).
Can you try on addons1.stage first - do the migration, but don't do the sanity checking restart?
Assignee | ||
Comment 22•13 years ago
|
||
(when you do addons1.stage, please make sure it has binlog_format=>"MIXED" in the manifest. the other addons stage servers have it in there).
Assignee | ||
Comment 23•13 years ago
|
||
When bouncer is done, we'll need to set max_connections=4800 on those servers. (should that be a separate bug?)
Comment 24•13 years ago
|
||
Well, so far I've also been moving the data to the appropriate directory, which *does* require a restart. And yes, that should be a separate bug.
Comment 25•13 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #22)
> (when you do addons1.stage, please make sure it has binlog_format=>"MIXED"
> in the manifest. the other addons stage servers have it in there).
This is part of the my.cnf installed by mysql2::server, so no problem.
Comment 26•13 years ago
|
||
addons slaves are done. I'll do a practice failover of the addons.stage master tomorrow morning before 9am us/eastern. In the interim, I'll continue working on the slaves in the 2-server m/s clusters.
Comment 27•13 years ago
|
||
At least the stage*.db.scl3 servers listed above aren't getting *any* mysql config from puppet -- they are left with config that was written back in april, when they were named tm-db-whatever. So, that's no fun! I've disabled puppet on one of them, and will be running it with --noop until I'm satisfied the results are safe to run for real.
Comment 28•13 years ago
|
||
Here are the my.cnf diffs for stage2.db.scl3:
* server-id changes
* old has slave-skip-errors=1062
* old has skip-locking
* log-bin, relay-log, and relay-log index have old hostname in filename
* new has skip-external-locking
* new has default_storage_engine=InnoDB (probably ok?)
* new has ft_stopword_file= (means default?)
* -key_buffer=120M -> +key_buffer_size=512M
* +join_buffer_size=8M
* +net_buffer_length=32K
* +preload_buffer_size=2M
* wait_timeout 120 -> 600
* +table_open_cache=3072
* +read_buffer_size=8M
* -log-queries-not-using-indexes
* -innodb_buffer_pool_size=2160M -> 1536M
* -innodb_data_home_dir =
* -innodb_data_file_path = /var/lib/mysql-innodb/innodb.db:1000M:autoextend
* -innodb_log_group_home_dir = /var/lib/mysql-innodb
* -innodb_log_arch_dir = /var/lib/mysql-innodb
* -innodb_file_io_threads=4
* -innodb_log_archive=0
* +innodb_file_per_table
* +innodb_flush_method=O_DIRECT
* -sort_buffer=6M -> 256K
I'll see if I can look up some of these and take a guess, but sheeri, please flag anything that sets off alarm bells for you in https://etherpad.mozilla.org/Yh0d5ymWiN in case I miss it.
Assignee | ||
Comment 29•13 years ago
|
||
In reply to comment 28, I have updated the etherpad.
Comment 30•13 years ago
|
||
I'm going to hold off on stage in light of bug 789035. Sheeri has added some good notes in the etherpad when the time comes to move it to mysql2::server.
Comment 31•13 years ago
|
||
I updated the dependencies and the etherpad to reflect the clusters where MySQL itself (rather than the Puppet module, as here) is being upgraded. We should roll in the puppet upgrade with the MySQL upgrade. That leaves available for work now:
> Bugzilla tp-bugs01-master01.phx db-mysql
> dev.phx dev1.db.phx1 -
> dev.phx dev2.db.phx1 -
> engagement engagement1.db.phx1 mysql_rhel6::slave
> engagement engagement2.db.phx1 mysql_rhel6::slave
> webdev webdev1.db.scl3 none in puppet, but no mysql config
> webdev webdev2.db.scl3 none in puppet, but no mysql config
Comment 32•13 years ago
|
||
Looks like sheeri knocked out dev.phx two days ago.
Assignee | ||
Comment 33•13 years ago
|
||
oops, yeah, forgot to let you know. :D I was upgrading them.
Note that tp-bugs01-master01 is currently a slave, so it can be taken out of the load balancer and changed whenever needed.
Comment 34•13 years ago
|
||
Looks like tp-bugs01-master01 already got fixed, too:
e531b4f4 (scabral@mozilla.com 2012-08-23 17:55:43 +0000 643) "mysql2::server":
As for the engagement cluster, a diff of my.cnf shows only a few questions:
* mysql2::server will remove 'low-priority-updates'. Do we need an option for that, or can it be changed?
* mysql2::server will change innodb_flush_log_at_trx_commit from 0 to 2. From my read, these are roughly equivalent, with 2 being a bit more resilient to failure. Is this an OK change?
Assignee | ||
Comment 35•13 years ago
|
||
Both those are OK changes to make (and ones I've made on the other clusters, they're things that were put in by default back when they were good ideas).
Comment 36•13 years ago
|
||
OK, I'll start the upgrades on the engagement slave shortly. We'll need a planned downtime for the master, I think.
Comment 37•13 years ago
|
||
The remainder is stalled on needing a planned downtime:
- upgrading most clusters to 5.1 or higher (bug 790300)
- switching masters to mysql2::server
Assignee | ||
Comment 38•13 years ago
|
||
All masters should have failovers, so if engagement isn't set up in that configuration, we should talk. But there shouldn't need to be a scheduled downtime, we should be able to switch over to the slave for writes during the maintenance/upgrade.
Comment 39•13 years ago
|
||
This is happening, slowly but surely:
> A01 tp-a01-master01.phx db-mysql
> A01 tp-a01-slave01.phx db-mysql
> A01 tp-a01-slave02.phx db-mysql
- needs to be upgraded first - https://bugzilla.mozilla.org/show_bug.cgi?id=780954
> B2 b2-db2.db.scl3 -
Hopefully we can reinstall with rhel6 - rhel5 has too many package conflicts
> datazilla1.db.scl3.mozilla.com
> datazilla2.db.scl3.mozilla.com
- ???
> dp-geodns01.phx.mozilla.com
> geodns1.vips.scl3.mozilla.com
- ??? (and yes. geodns1.vips is a VM, not a VIP)
> tp-b02-master01.phx.mozilla.com
> C01 tp-c01-master01.phx db-mysql
> C01 tp-c01-slave01.phx db-mysql
- needs to be upgraded or decomm'd - https://bugzilla.mozilla.org/show_bug.cgi?id=790413
> sumo support1.db.phx1 db-mysql
> sumo support2.db.phx1 db-mysql
> sumo support3.db.phx1 db-mysql
> sumo support4.db.phx1 db-mysql
- needs to be upgraded - https://bugzilla.mozilla.org/show_bug.cgi?id=785987
> webdev webdev1.db.scl3 none in puppet, but no mysql config
> webdev webdev2.db.scl3 none in puppet, but no mysql config
- needs to be upgraded - https://bugzilla.mozilla.org/show_bug.cgi?id=791316
- needs to have the mysql2 class added - bug 801473
** note that there is now a mysql2-puppetized-server nagios hostgroup, so as nodes are added to puppet, they should be added to that hostgroup too.
Assignee | ||
Comment 40•13 years ago
|
||
tp-b02-master01.phx.mozilla.com was put in earlier this week.
Assignee | ||
Comment 41•13 years ago
|
||
support3 was removed from db-mysql and put into mysql2 today.
Assignee | ||
Comment 42•13 years ago
|
||
I think all the support and a01 machines are done, and c01 is turned off. can I get an updated list? (if c01 is in there, let me know, I can delete them from puppet).
Comment 43•13 years ago
|
||
Looks like the remainder is:
> datazilla1.db.scl3.mozilla.com
> dp-geodns01.phx.mozilla.com
> geodns1.vips.scl3.mozilla.com
and really, this shouldn't be assigned to me :(
Assignee: dustin → scabral
Assignee | ||
Comment 44•13 years ago
|
||
These are scheduled to go away as per https://bugzilla.mozilla.org/show_bug.cgi?id=796161
> dp-geodns01.phx.mozilla.com
> geodns1.vips.scl3.mozilla.com
And as for datazilla1, well, datazilla2 was already in there, so I added datazilla1.
I'm going to resolve this one.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Data & BI Services Team
You need to log in
before you can comment on or make changes to this bug.
Description
•