Closed Bug 1378433 Opened 7 years ago Closed 7 years ago

Update from MySQL 5.6 to 5.7

Categories

(Tree Management :: Treeherder, enhancement, P2)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Attachments

(9 files)

MySQL 5.7 has much better performance for some schema migration operations (eg 2-3 times as fast), as well as a number of other performance and bug fixes. First step will be testing MySQL 5.7 in Vagrant, then Travis, then on the Heroku prototype's RDS instance, ...
For MySQL 5.6 -> 5.7 upgrades, the upgrade process forces the use of the new temporal types, which if upgrading from tables created by a version older than 5.6.4 will mean very long ALTER TABLES during the upgrade. Using the scripts at: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_UpgradeDBInstance.MySQL.html https://cloudiamo.com/2016/11/14/migrating-a-rds-mysql-5-6-to-rds-mysql-5-7/ ...it seems as though we don't have any such tables so should be fine. The test on prototype will confirm this (depending on whether it just takes 5 minutes or hours).
Depends on: 1378950
Comment on attachment 8884109 [details] [review] [treeherder] mozilla:mysql-5.7 > mozilla:master This switches Vagrant/Travis, and I think is worth landing now rather than waiting for the Terraform pieces, so we get longer to sanity check the new version locally through normal usage. Notes on changes on the new version: https://dev.mysql.com/doc/refman/5.7/en/upgrading-from-previous-series.html Once we're done with this bug (switching all environments to 5.7), we can also look into removing our custom sql_mode setting and using the new stricter default in 5.7 (we had to enable a few things since the 5.6 defaults were pretty lax, but now there are even more things enabled out of the box): https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sql-mode-changes
Attachment #8884109 - Flags: review?(cdawson)
Attachment #8884109 - Flags: review?(cdawson) → review+
Commits pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/fd75e9ae15fcb2d2181097d05c8e2f040399916f Bug 1378433 - Update to MySQL 5.7 in Vagrant/Travis The Vagrant environment is running Ubuntu 16.04, which has a native mysql-server-5.7 package available, so doesn't need a custom APT repository to be set. Travis is still on Ubuntu 14.04 (since they don't offer 16.04 yet), so has to use the upstream mysql.com repository instead. In the future I'll be switching both Travis and local development to use the same Docker images to avoid these kind of differences. After this lands, we'll want to open a PR against the Terraform configs to switch the treeherder-dev RDS instance to MySQL 5.7, before doing the same for stage/prod. https://github.com/mozilla/treeherder/commit/225c772e38b732e0bb6ee6cb494870ffc7ba4a41 Bug 1378433 - Stop vendoring libmysqlclient in Vagrant/Travis Now that we're using MySQL 5.7 server, there is no longer a conflict between our desired choice of client library (which had to be 5.7 to avoid security vulnerabilities) and the server version. The library is still vendored on Heroku for now (in `bin/pre_compile`), but that can stop too, once we switch to the Heroku-16 stack which is based on Ubuntu 16.04 (bug 1365567).
Comment on attachment 8884252 [details] [review] devservices-aws PR#53: Upgrade treeherder-dev to MySQL 5.7 Git merged and terraform has been applied. I temporarily enabled allow_major_version_upgrade to allow the upgrade. Once the upgrade completed and state achieved, allow_major_version_upgrade was removed to be set back to false (default). ~ aws_db_instance.treeherder-dev-rds allow_major_version_upgrade: "" => "true" engine_version: "5.6.34" => "5.7.17" parameter_group_name: "treeherder" => "treeherder-mysql57" aws_db_instance.treeherder-dev-rds: Still modifying... (ID: treeherder-dev, 1h12m51s elapsed) aws_db_instance.treeherder-dev-rds: Modifications complete (ID: treeherder-dev)
Attachment #8884252 - Flags: review?(jwatkins)
Attachment #8884252 - Flags: review+
Attachment #8884252 - Flags: checkin+
Thank you for merging/applying that :-) (In reply to Jake Watkins [:dividehex] from comment #6) > Git merged and terraform has been applied. I temporarily enabled > allow_major_version_upgrade to allow the upgrade. Once the upgrade > completed and state achieved, allow_major_version_upgrade was removed to be > set back to false (default). Ah the `allow_major_version_upgrade` setting doesn't appear to exist in the UI so wasn't aware of it. I see it now on: https://www.terraform.io/docs/providers/aws/r/db_instance.html Looking at the timestamps in the RDS logs (plus the "1h12m51s elapsed" above) got me concerned initially that the update did need the ALTER TABLEs for legacy date fields after all (comment 1) - however the `mysqlUpgrade` log (attached) shows the upgrade itself was quick - it was just the backup that took a while. When I checked the instance just now, the parameter group was showing as "pending reboot", so I had to reboot it. Also, the event log shows: July 7, 2017 at 10:09:21 PM UTC+1 DB instance shutdown July 7, 2017 at 10:09:29 PM UTC+1 Backing up DB instance July 7, 2017 at 10:11:13 PM UTC+1 DB instance shutdown July 7, 2017 at 10:11:54 PM UTC+1 DB instance restarted July 7, 2017 at 10:12:43 PM UTC+1 Updated to use DBParameterGroup treeherder-mysql57 July 7, 2017 at 10:52:41 PM UTC+1 Finished DB Instance backup July 7, 2017 at 10:52:41 PM UTC+1 Database instance patched July 7, 2017 at 10:52:51 PM UTC+1 Backing up DB instance July 7, 2017 at 11:16:15 PM UTC+1 Finished DB Instance backup This makes me think that Terraform didn't apply all the changes as one action, but tired to first apply the parameter group before updating the version. I also noticed there is a bug relating to updating master+replica combos (like we use for prod): https://github.com/terraform-providers/terraform-provider-aws/issues/775 Given all of the above, I wonder whether we should just do stage/prod by hand (when the time comes), and then update the Terraform configs retrospectively? (Particularly given the need to do the `allow_major_version_upgrade` dance too.) (In reply to Ed Morley [:emorley] from comment #3) > Once we're done with this bug (switching all environments to 5.7), we can > also look into removing our custom sql_mode setting and using the new > stricter default in 5.7 Note to self: For MySQL 5.7 the AWS default parameter group actually overrides the new upstream MySQL defaults (presumably so customers don't blame them for breakage), so we can't remove the setting entirely.
(In reply to Ed Morley [:emorley] from comment #7) > Given all of the above, I wonder whether we should just do stage/prod by > hand (when the time comes), and then update the Terraform configs > retrospectively? (Particularly given the need to do the > `allow_major_version_upgrade` dance too.) Things appear to be working well on the dev instance, so I think we're ready to try stage soon. What are you thoughts on the above? :-)
Flags: needinfo?(jwatkins)
(In reply to Ed Morley [:emorley] from comment #8) > (In reply to Ed Morley [:emorley] from comment #7) > > Given all of the above, I wonder whether we should just do stage/prod by > > hand (when the time comes), and then update the Terraform configs > > retrospectively? (Particularly given the need to do the > > `allow_major_version_upgrade` dance too.) > > Things appear to be working well on the dev instance, so I think we're ready > to try stage soon. > > What are you thoughts on the above? :-) I'm fine with doing it either way. The `allow_major_version_upgrade` dance wasn't much of an inconvenience. I would personally opt for rolling the upgrade through terraform again since it worked fine with dev and there is no need to go back and match state after the fact. It's your call, I'm game for either. Just let me know when and/or drop a PR. :-)
Flags: needinfo?(jwatkins)
My main concern in comment 7 was that Terraform seemed to be applying the changes as two separate steps: 1) Changing parameter group on its own (but not properly applying it due to lack of restart) 2) Updating the MySQL version This left the prototype instance using the default parameter group until I restarted, which means the wrong DB collation (amongst other things) until I manually restarted the instance. I was under the impression that if done via the AWS console the two steps would normally be done as one action, avoiding this problem?
Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/ed34567863fba80eb27f519b6b3cc22366b6a378 Bug 1378433 - Skip test_mysqlclient_tls_enforced() on Travis Since we have to upgrade from MySQL 5.6 to 5.7 and there are still enough remnants behind that cause the test to fail. Multiple attempts have been made to clean them up to no avail, and since this will go away soon when we switch Travis to using Docker images, it's not worth spending any more time on it.
I initiated the upgrade on treeherder-stage rds starting at 9:13am PDT but terraform ended up timing out. The view from the console shows it is still in the process of upgrading. I'm getting the impression the upgrade is stuck at the moment. aws_db_instance.treeherder-stage-rds: timeout while waiting for state to become 'available' (last state: 'upgrading', timeout: 1h20m0s)
I believe the upgrade itself has completed, but RDS keeps the "upgrading" state until the backup has finished - at least judging from when dev was upgraded. Though the backup really should have completed by now, so not sure.
For posterity, Jake opened an AWS support ticket, whose response was: """ I have taken a look at your RDS instance treeherder-stage and can indeed see it is currently going through the update process. Currently a snapshot is being taken of your RDS instance as a security measure which is what is taking a majority of the time. Your RDS instance however should be usable in it's current state. In order for your RDS state to be in an available state, the current shapshot will need to coplete. Right now your snapshot is approximately at 56%. I can not provide an estimate however of the amount of time it may take for the snapshot to complete as this will depend on the RDS instance workload. I have confirmed however that the instance is healthy and this process is progressing as expected. ... """ That 56% was 4.5 hours after the update started. The backup finally finished 9 hours after the update started. Compare to the 40 minutes that it took for the dev instance (which is the same instance size and DB size). The good news is that New Relic shows the DB was only actually inaccessible for ~4 minutes, the rest was just the background backup. Note I still had to manually restart the instance for the parameter group to apply. Perhaps we need to set the "apply immediately" option when we do prod?
Attachment #8886571 - Flags: review?(jwatkins)
Attachment #8886571 - Flags: review+
Attachment #8886571 - Flags: checkin+
Everything looking ok for prototype/stage, I start the process for prod once I'm back from PTO next week.
Blocks: 1387258
Jake, I don't suppose we could try making this change sometime over the weekend, or else say early Monday, just in case there are any hiccups?
Attachment #8893697 - Flags: review?(jwatkins)
I can see about doing this tomorrow morning (EST) if that's acceptable
That would be great - thank you :-)
Depends on: 1388421
Other than cleaning up the now unused RDS treeherder custom mysql5.6 parameter group, we're all done here :-)
Attachment #8895329 - Flags: review?(jwatkins)
Attachment #8895329 - Flags: review?(klibby)
Comment on attachment 8895329 [details] [diff] [review] devservices-aws PR#63: Remove MySQL 5.6 specific parameter group Review of attachment 8895329 [details] [diff] [review]: ----------------------------------------------------------------- Terribly sorry for the delay to this review. Cleanup had to take a backseat to relops OKRs. Patch LGTM
Attachment #8895329 - Attachment is patch: true
Attachment #8895329 - Attachment mime type: text/x-github-pull-request → text/plain
Attachment #8895329 - Flags: review?(jwatkins) → review+
Comment on attachment 8895329 [details] [diff] [review] devservices-aws PR#63: Remove MySQL 5.6 specific parameter group Merged and applied
Attachment #8895329 - Flags: review?(klibby) → checkin+
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: