Closed Bug 528573 Opened 16 years ago Closed 15 years ago

Need to drop and reload the b01 database master and slave for disk space

Categories

(mozilla.org Graveyard :: Server Operations, task)

Product:

Component:

Platform:

All

Other

Type:

task

Priority:

Not set

Severity:

minor

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: justdave, Assigned: justdave)

Details

(Whiteboard: Sun 01/03 12:00pm PST)

Dave Miller [:justdave]

Assignee

Description

•

16 years ago

The b01 database cluster recently had a database removed that was using about 120 GB of disk space. Unfortunately it was innodb, too, and innodb only grows, it never shrinks. In order to reclaim the disk space, we need to completely drop the entire data storage and reload it from scratch via mysqldump and import. This requires taking down both the master and the slave and reloading both of them individually. I expect the process to take between 30 and 60 minutes on each server.

Dave Miller [:justdave]

Assignee

Comment 1

•

16 years ago

This should not be scheduled the same night as the kernel upgrades because I won't have time to deal with both at once.

Flags: needs-downtime+

matthew zeier [:mrz]

Comment 2

•

16 years ago

What if someone else is handling kernel upgrades?

Dave Miller [:justdave]

Assignee

Comment 3

•

16 years ago

Sure. This will actually take a while to run, on both ends, I can start it and go work on kernels while I wait for it to finish, too, I guess.

Shyam Mani [:fox2mike]

Comment 4

•

16 years ago

I'd be glad to give Dave a hand with anything he needs.

Dave Miller [:justdave]

Assignee

Comment 5

•

16 years ago

this will affect graphs.mozilla.org for the downtime...

Dave Miller [:justdave]

Assignee

Updated

•

16 years ago

Assignee: server-ops → justdave

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 6

•

16 years ago

(In reply to comment #5) > this will affect graphs.mozilla.org for the downtime... Does this mean that http posts to graphs.m.o would fail out? If so, talos jobs will fail out - we'd need to coordinate closing the tree for the duration. When are you thinking of doing this? And approx how long would it take?

Dave Miller [:justdave]

Assignee

Comment 7

•

16 years ago

(In reply to comment #6) > (In reply to comment #5) > > this will affect graphs.mozilla.org for the downtime... > > Does this mean that http posts to graphs.m.o would fail out? If so, talos jobs > will fail out - we'd need to coordinate closing the tree for the duration. Yes. > When are you thinking of doing this? And approx how long would it take? Tuesday night, but I'm open to changing that around to accomodate. I suspect it'll take somewhere between 15 and 60 minutes.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 8

•

16 years ago

(In reply to comment #7) > (In reply to comment #6) > > (In reply to comment #5) > > > this will affect graphs.mozilla.org for the downtime... > > > > Does this mean that http posts to graphs.m.o would fail out? If so, talos jobs > > will fail out - we'd need to coordinate closing the tree for the duration. > Yes. ok. > > When are you thinking of doing this? And approx how long would it take? > Tuesday night, but I'm open to changing that around to accomodate. I suspect > it'll take somewhere between 15 and 60 minutes. We already have a Talos downtime scheduled for 9am-11am Monday (see dev.planning "Talos downtime, Monday November 16th 9-11am PST"). It would be great if we could (safely) do all this in one downtime. Would your db work be ready to go ride-along Monday morning?

Dave Miller [:justdave]

Assignee

Comment 9

•

16 years ago

This will also affect the following sites: bonsai buildbot despot graphs_mozilla_org litmus viewvc_svn MDC

matthew zeier [:mrz]

Comment 10

•

16 years ago

Guessing this didn't happen on the 16th. When can this happen this week?

matthew zeier [:mrz]

Updated

•

16 years ago

Whiteboard: 12/03/2009 @ 8pm

Dave Miller [:justdave]

Assignee

Comment 11

•

16 years ago

We had a bunch of miscommunication about this this morning... we were apparently going to try to do this at 9am this morning, but no downtime notice got sent for it. The decision to do 9am happened in the afternoon yesterday, though. Because of MDC, SVN, and Litmus being affected, we either need to do this in a normal Tuesday/Thursday downtime window, or have 24 hours or more advanced notice when the downtime notice goes out if we do it outside of one of those windows, because this will be a (rather long) user-facing outage.

matthew zeier [:mrz]

Updated

•

16 years ago

Assignee: justdave → mrz

Whiteboard: 12/03/2009 @ 8pm → 12/17/2009 @ 8pm

matthew zeier [:mrz]

Updated

•

16 years ago

Assignee: mrz → justdave

Dave Miller [:justdave]

Assignee

Comment 12

•

16 years ago

So when's a good time for build for us to do this?

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 13

•

16 years ago

per beltzner: Its blocked waiting until after we do the 3.6RC builds.

Dave Miller [:justdave]

Assignee

Updated

•

16 years ago

Whiteboard: 12/17/2009 @ 8pm → "Really Soon Now" according to joduinn

Dave Miller [:justdave]

Assignee

Comment 14

•

16 years ago

whenever we get to this, we probably want to try to do bug 535859 at the same time.

Dave Miller [:justdave]

Assignee

Updated

•

15 years ago

Whiteboard: "Really Soon Now" according to joduinn → Sun 01/03 12:00pm PST

Dave Miller [:justdave]

Assignee

Comment 15

•

15 years ago

reminder to myself: want to reconfig innodb for file_per_table while we do this, which will prevent this situation from coming up again in the future.

Dave Miller [:justdave]

Assignee

Comment 16

•

15 years ago

reload began around 12:20pm (right after the firmware updates from bug 535859 completed). As of now, it's still in progress. tm-b01-slave01 will be around 15 minutes later than master01 coming back, probably, because I forgot to run screen first, and got disconnected 10 minutes in... :| master01 was running in screen, fortunately (I did get disconnected from both at the same time).

Dave Miller [:justdave]

Assignee

Comment 17

•

15 years ago

master's up, slave should be done any minute now.

Dave Miller [:justdave]

Assignee

Comment 18

•

15 years ago

and the slave is up. All done. Just under the wire. :)

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

10 years ago

Product: mozilla.org → mozilla.org Graveyard

You need to log in before you can comment on or make changes to this bug.