Closed
Bug 917856
Opened 11 years ago
Closed 11 years ago
Request to reboot tree-closing database servers during next maintenance window
Categories
(Infrastructure & Operations :: Change Requests, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bjohnson, Unassigned)
References
Details
date, time, duration of maintenance Next maintenance window system(s) affected generic1.db.scl3 generic3.db.phx1 buildbot1 builder-addons1 sentry1 tbpl1 db1.iddb bugzilla1.db.scl3 end-user impact databases will be unavailable for roughly 5-10 minutes for each server. maintenance plan and timeline (link to a wiki or etherpad is fine) This is only a server reboot, applying changes already deployed by puppet that has been tested and proven to work. (adding noatime/nodiratime to the data volume) rollback plan / rollback point (at which point will you determine to roll back) If the system fails to reboot, we can PXE boot it. notification mechanisms Normal maintenance window downtime. who will be point, who else will be involved DB team will be point and handle all reboots. If any tree-closing apps can't re-establish their db connection safely, their team should be involved.
Reporter | ||
Updated•11 years ago
|
Flags: cab-review?
Reporter | ||
Comment 1•11 years ago
|
||
per request during CAB, when we reboot generic3 in phx1, let's coordinate a shutdown of etherpad app first, prior to db going down.
Comment 2•11 years ago
|
||
Tentatively approved for the next tree closing window Oct 12th. CC'ing some service owners so they know of potential impact.
Group: infra
Flags: cab-review? → cab-review+
Comment 3•11 years ago
|
||
We realized we did not actually need to perform a reboot - we were changing mountpoint options to be more efficient, and doing it through puppet, puppet remounts the directories right away. In tests, machines had no problems remounting /, so we just did it without rebooting. All of the following were done today: generic1.db.scl3 generic3.db.phx1 buildbot1 builder-addons1 sentry1 tbpl1 bugzilla1.db.scl3 This one was not done: db1.iddb It is the identity db, and is not puppetized by us, and I was not about to live remount a system without having tested first (especially when I would have been remounting /). We have a spreadsheet with what's done and not done at: https://docs.google.com/a/mozilla.com/spreadsheet/ccc?key=0AvGP1OghOtJSdC1FTnlTQmtxZVRkbG1NM1FlYkUtQlE&usp=drive_web#gid=0
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → Infrastructure & Operations
Updated•9 years ago
|
Change Request: --- → approved
Flags: cab-review+
You need to log in
before you can comment on or make changes to this bug.
Description
•