date, time, duration of maintenance Next maintenance window system(s) affected generic1.db.scl3 generic3.db.phx1 buildbot1 builder-addons1 sentry1 tbpl1 db1.iddb bugzilla1.db.scl3 end-user impact databases will be unavailable for roughly 5-10 minutes for each server. maintenance plan and timeline (link to a wiki or etherpad is fine) This is only a server reboot, applying changes already deployed by puppet that has been tested and proven to work. (adding noatime/nodiratime to the data volume) rollback plan / rollback point (at which point will you determine to roll back) If the system fails to reboot, we can PXE boot it. notification mechanisms Normal maintenance window downtime. who will be point, who else will be involved DB team will be point and handle all reboots. If any tree-closing apps can't re-establish their db connection safely, their team should be involved.
per request during CAB, when we reboot generic3 in phx1, let's coordinate a shutdown of etherpad app first, prior to db going down.
Tentatively approved for the next tree closing window Oct 12th. CC'ing some service owners so they know of potential impact.
Flags: cab-review? → cab-review+
We realized we did not actually need to perform a reboot - we were changing mountpoint options to be more efficient, and doing it through puppet, puppet remounts the directories right away. In tests, machines had no problems remounting /, so we just did it without rebooting. All of the following were done today: generic1.db.scl3 generic3.db.phx1 buildbot1 builder-addons1 sentry1 tbpl1 bugzilla1.db.scl3 This one was not done: db1.iddb It is the identity db, and is not puppetized by us, and I was not about to live remount a system without having tested first (especially when I would have been remounting /). We have a spreadsheet with what's done and not done at: https://docs.google.com/a/mozilla.com/spreadsheet/ccc?key=0AvGP1OghOtJSdC1FTnlTQmtxZVRkbG1NM1FlYkUtQlE&usp=drive_web#gid=0
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.