Last Comment Bug 528573 - Need to drop and reload the b01 database master and slave for disk space
: Need to drop and reload the b01 database master and slave for disk space
Status: RESOLVED FIXED
Sun 01/03 12:00pm PST
:
Product: mozilla.org Graveyard
Classification: Graveyard
Component: Server Operations (show other bugs)
: other
: All Other
: -- minor (vote)
: ---
Assigned To: Dave Miller [:justdave] (justdave@bugzilla.org)
: matthew zeier [:mrz]
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-11-13 12:19 PST by Dave Miller [:justdave] (justdave@bugzilla.org)
Modified: 2015-03-12 08:17 PDT (History)
6 users (show)
justdave: needs‑downtime+
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments

Description Dave Miller [:justdave] (justdave@bugzilla.org) 2009-11-13 12:19:19 PST
The b01 database cluster recently had a database removed that was using about 120 GB of disk space.  Unfortunately it was innodb, too, and innodb only grows, it never shrinks.  In order to reclaim the disk space, we need to completely drop the entire data storage and reload it from scratch via mysqldump and import.  This requires taking down both the master and the slave and reloading both of them individually.  I expect the process to take between 30 and 60 minutes on each server.
Comment 1 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-11-13 12:20:04 PST
This should not be scheduled the same night as the kernel upgrades because I won't have time to deal with both at once.
Comment 2 matthew zeier [:mrz] 2009-11-13 22:02:18 PST
What if someone else is handling kernel upgrades?
Comment 3 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-11-14 09:04:35 PST
Sure.  This will actually take a while to run, on both ends, I can start it and go work on kernels while I wait for it to finish, too, I guess.
Comment 4 Shyam Mani [:fox2mike] 2009-11-14 09:46:44 PST
I'd be glad to give Dave a hand with anything he needs.
Comment 5 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-11-14 10:38:24 PST
this will affect graphs.mozilla.org for the downtime...
Comment 6 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2009-11-14 18:17:16 PST
(In reply to comment #5)
> this will affect graphs.mozilla.org for the downtime...

Does this mean that http posts to graphs.m.o would fail out? If so, talos jobs will fail out - we'd need to coordinate closing the tree for the duration.

When are you thinking of doing this? And approx how long would it take?
Comment 7 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-11-14 19:02:59 PST
(In reply to comment #6)
> (In reply to comment #5)
> > this will affect graphs.mozilla.org for the downtime...
> 
> Does this mean that http posts to graphs.m.o would fail out? If so, talos jobs
> will fail out - we'd need to coordinate closing the tree for the duration.

Yes.

> When are you thinking of doing this? And approx how long would it take?

Tuesday night, but I'm open to changing that around to accomodate.  I suspect it'll take somewhere between 15 and 60 minutes.
Comment 8 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2009-11-15 11:09:28 PST
(In reply to comment #7)
> (In reply to comment #6)
> > (In reply to comment #5)
> > > this will affect graphs.mozilla.org for the downtime...
> > 
> > Does this mean that http posts to graphs.m.o would fail out? If so, talos jobs
> > will fail out - we'd need to coordinate closing the tree for the duration.
> Yes.
ok.

> > When are you thinking of doing this? And approx how long would it take?
> Tuesday night, but I'm open to changing that around to accomodate.  I suspect
> it'll take somewhere between 15 and 60 minutes.
We already have a Talos downtime scheduled for 9am-11am Monday (see dev.planning "Talos downtime, Monday November 16th 9-11am PST"). It would be great if we could (safely) do all this in one downtime. Would your db work be ready to go ride-along Monday morning?
Comment 9 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-11-15 23:14:19 PST
This will also affect the following sites:

bonsai
buildbot
despot
graphs_mozilla_org
litmus
viewvc_svn
MDC
Comment 10 matthew zeier [:mrz] 2009-11-27 15:58:22 PST
Guessing this didn't happen on the 16th.  When can this happen this week?
Comment 11 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-12-02 10:18:37 PST
We had a bunch of miscommunication about this this morning...  we were apparently going to try to do this at 9am this morning, but no downtime notice got sent for it.  The decision to do 9am happened in the afternoon yesterday, though.  Because of MDC, SVN, and Litmus being affected, we either need to do this in a normal Tuesday/Thursday downtime window, or have 24 hours or more advanced notice when the downtime notice goes out if we do it outside of one of those windows, because this will be a (rather long) user-facing outage.
Comment 12 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-12-17 15:14:24 PST
So when's a good time for build for us to do this?
Comment 13 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2009-12-17 15:36:47 PST
per beltzner: Its blocked waiting until after we do the 3.6RC builds.
Comment 14 Dave Miller [:justdave] (justdave@bugzilla.org) 2009-12-18 12:53:05 PST
whenever we get to this, we probably want to try to do bug 535859 at the same time.
Comment 15 Dave Miller [:justdave] (justdave@bugzilla.org) 2010-01-03 00:52:42 PST
reminder to myself: want to reconfig innodb for file_per_table while we do this, which will prevent this situation from coming up again in the future.
Comment 16 Dave Miller [:justdave] (justdave@bugzilla.org) 2010-01-03 13:22:13 PST
reload began around 12:20pm (right after the firmware updates from bug 535859 completed).  As of now, it's still in progress.  tm-b01-slave01 will be around 15 minutes later than master01 coming back, probably, because I forgot to run screen first, and got disconnected 10 minutes in... :|  master01 was running in screen, fortunately (I did get disconnected from both at the same time).
Comment 17 Dave Miller [:justdave] (justdave@bugzilla.org) 2010-01-03 13:56:05 PST
master's up, slave should be done any minute now.
Comment 18 Dave Miller [:justdave] (justdave@bugzilla.org) 2010-01-03 13:59:26 PST
and the slave is up.  All done.  Just under the wire. :)

Note You need to log in before you can comment on or make changes to this bug.