Closed
Bug 528573
Opened 15 years ago
Closed 14 years ago
Need to drop and reload the b01 database master and slave for disk space
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: justdave, Assigned: justdave)
Details
(Whiteboard: Sun 01/03 12:00pm PST)
The b01 database cluster recently had a database removed that was using about 120 GB of disk space. Unfortunately it was innodb, too, and innodb only grows, it never shrinks. In order to reclaim the disk space, we need to completely drop the entire data storage and reload it from scratch via mysqldump and import. This requires taking down both the master and the slave and reloading both of them individually. I expect the process to take between 30 and 60 minutes on each server.
Assignee | ||
Comment 1•15 years ago
|
||
This should not be scheduled the same night as the kernel upgrades because I won't have time to deal with both at once.
Flags: needs-downtime+
Comment 2•15 years ago
|
||
What if someone else is handling kernel upgrades?
Assignee | ||
Comment 3•15 years ago
|
||
Sure. This will actually take a while to run, on both ends, I can start it and go work on kernels while I wait for it to finish, too, I guess.
Comment 4•15 years ago
|
||
I'd be glad to give Dave a hand with anything he needs.
Assignee | ||
Comment 5•15 years ago
|
||
this will affect graphs.mozilla.org for the downtime...
Assignee | ||
Updated•15 years ago
|
Assignee: server-ops → justdave
Comment 6•15 years ago
|
||
(In reply to comment #5) > this will affect graphs.mozilla.org for the downtime... Does this mean that http posts to graphs.m.o would fail out? If so, talos jobs will fail out - we'd need to coordinate closing the tree for the duration. When are you thinking of doing this? And approx how long would it take?
Assignee | ||
Comment 7•15 years ago
|
||
(In reply to comment #6) > (In reply to comment #5) > > this will affect graphs.mozilla.org for the downtime... > > Does this mean that http posts to graphs.m.o would fail out? If so, talos jobs > will fail out - we'd need to coordinate closing the tree for the duration. Yes. > When are you thinking of doing this? And approx how long would it take? Tuesday night, but I'm open to changing that around to accomodate. I suspect it'll take somewhere between 15 and 60 minutes.
Comment 8•15 years ago
|
||
(In reply to comment #7) > (In reply to comment #6) > > (In reply to comment #5) > > > this will affect graphs.mozilla.org for the downtime... > > > > Does this mean that http posts to graphs.m.o would fail out? If so, talos jobs > > will fail out - we'd need to coordinate closing the tree for the duration. > Yes. ok. > > When are you thinking of doing this? And approx how long would it take? > Tuesday night, but I'm open to changing that around to accomodate. I suspect > it'll take somewhere between 15 and 60 minutes. We already have a Talos downtime scheduled for 9am-11am Monday (see dev.planning "Talos downtime, Monday November 16th 9-11am PST"). It would be great if we could (safely) do all this in one downtime. Would your db work be ready to go ride-along Monday morning?
Assignee | ||
Comment 9•15 years ago
|
||
This will also affect the following sites: bonsai buildbot despot graphs_mozilla_org litmus viewvc_svn MDC
Comment 10•15 years ago
|
||
Guessing this didn't happen on the 16th. When can this happen this week?
Updated•15 years ago
|
Whiteboard: 12/03/2009 @ 8pm
Assignee | ||
Comment 11•15 years ago
|
||
We had a bunch of miscommunication about this this morning... we were apparently going to try to do this at 9am this morning, but no downtime notice got sent for it. The decision to do 9am happened in the afternoon yesterday, though. Because of MDC, SVN, and Litmus being affected, we either need to do this in a normal Tuesday/Thursday downtime window, or have 24 hours or more advanced notice when the downtime notice goes out if we do it outside of one of those windows, because this will be a (rather long) user-facing outage.
Updated•15 years ago
|
Assignee: justdave → mrz
Whiteboard: 12/03/2009 @ 8pm → 12/17/2009 @ 8pm
Updated•15 years ago
|
Assignee: mrz → justdave
Assignee | ||
Comment 12•15 years ago
|
||
So when's a good time for build for us to do this?
Comment 13•15 years ago
|
||
per beltzner: Its blocked waiting until after we do the 3.6RC builds.
Assignee | ||
Updated•15 years ago
|
Whiteboard: 12/17/2009 @ 8pm → "Really Soon Now" according to joduinn
Assignee | ||
Comment 14•15 years ago
|
||
whenever we get to this, we probably want to try to do bug 535859 at the same time.
Assignee | ||
Updated•14 years ago
|
Whiteboard: "Really Soon Now" according to joduinn → Sun 01/03 12:00pm PST
Assignee | ||
Comment 15•14 years ago
|
||
reminder to myself: want to reconfig innodb for file_per_table while we do this, which will prevent this situation from coming up again in the future.
Assignee | ||
Comment 16•14 years ago
|
||
reload began around 12:20pm (right after the firmware updates from bug 535859 completed). As of now, it's still in progress. tm-b01-slave01 will be around 15 minutes later than master01 coming back, probably, because I forgot to run screen first, and got disconnected 10 minutes in... :| master01 was running in screen, fortunately (I did get disconnected from both at the same time).
Assignee | ||
Comment 17•14 years ago
|
||
master's up, slave should be done any minute now.
Assignee | ||
Comment 18•14 years ago
|
||
and the slave is up. All done. Just under the wire. :)
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•