Closed Bug 571497 Opened 15 years ago Closed 15 years ago

Rebuild SUMO database cluster.

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: jsocol, Assigned: tellis)

Details

Attachments

(1 file)

Just the schema (no data) for support_mozilla_com database. 15 years ago timellis 159.27 KB, text/x-sql		Details

James Socol [:jsocol, :james]

Reporter

Description

•

15 years ago

We need to figure out why the SUMO sessions table decided to grow so quickly, and come up with a plan to return disk space to the filesystem. Tim, I'm not sure we ever really found out what was up with the session garbage collection query we were hunting for. Is it possible it got turned off? Shyam, is it possible PHP's session.auto_start[1] is turned on on some of the generic boxes? [1] http://www.php.net/manual/en/session.configuration.php#ini.session.auto-start

Shyam Mani [:fox2mike]

Comment 1

•

15 years ago

(In reply to comment #0) > Shyam, is it possible PHP's session.auto_start[1] is turned on on some of the > generic boxes? [root@mradm02 wiki.mozilla.org]# /data/bin/issue-multi-command.py generic 'grep session.auto_start /etc/php.ini' ===pm-app-generic03=== session.auto_start = 0 ===pm-app-generic04=== session.auto_start = 0 ===pm-app-generic01=== session.auto_start = 0 ===pm-app-generic06=== session.auto_start = 0 ===pm-app-generic05=== session.auto_start = 0 ===pm-app-generic02=== session.auto_start = 0

Phong Tran [:phong]

Updated

•

15 years ago

Assignee: server-ops → shyam

Shyam Mani [:fox2mike]

Comment 2

•

15 years ago

Passing to Tim, this has more to do with the DB than anything else.

Assignee: shyam → tellis

James Socol [:jsocol, :james]

Reporter

Comment 3

•

15 years ago

Tim, I've noticed that replication lag (in munin) has been a lot less stable since sometime Tuesday morning. I wonder if that's related?

timellis

Assignee

Comment 4

•

15 years ago

The cron job is still there (just checked it yesterday, actually). I'll ensure it's still running.

timellis

Assignee

Comment 5

•

15 years ago

I cannot manage to delete any sessions without getting a lock wait timeout. I am going to try truncating the sessions table to see if that will get it back on track.

timellis

Assignee

Comment 6

•

15 years ago

So the sessions table filled up. We don't have innodb file_per_table enabled on the cluster. The sessions table can't be emptied via normal means, because something is wrong with it. There are a number of things pointing to that this cluster just needs to be rebuilt. This will take a number of hours, and so should happen in the next outage window. Here's the general plan: (1) Truncate the sessions table. (2) mysqldump the whole support_mozilla_com database on master. (3) Copy the mysqldump to all slaves. (4) Drop the database on master and all slaves. (5) Recreate the database on master and all slaves. (6) Do all this in Phoenix. The total time I estimate as about 3 hours.

timellis

Assignee

Updated

•

15 years ago

Severity: critical → blocker

Summary: Investigate SUMO session spike → Rebuild SUMO database cluster.

timellis

Assignee

Comment 7

•

15 years ago

Bouncing mysqld cleared up the horked lock problem on sessions, so we can delete them. That doesn't fix the InnoDB space-taken problem, though, so we still need this outage. Also, it will fix the schemas being out-of-sync between master and slaves.

timellis

Assignee

Comment 8

•

15 years ago

Attached file Just the schema (no data) for support_mozilla_com database. — Details

James Socol [:jsocol, :james]

Reporter

Comment 9

•

15 years ago

Is this outage still planned for Tuesday night? If so is it in the downtime notice for Tuesday?

[:Cww]

Comment 10

•

15 years ago

Best time for this outage: ~ 7 PM pacific in terms of minimal traffic.

timellis

Assignee

Comment 11

•

15 years ago

Outage over. I'll check up in the morn to see if things are progressing well.

timellis

Assignee

Comment 12

•

15 years ago

Aftermath: database sizes much reduced. Failed to get innodb_file_per_table done in the outage window. Sessions no longer piling up (they were at 9M before the outage, and about 7k now, and I see them growing/shrinking as the sessions delete script does its job). It seems this is a win. I'm resolving fixed.

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

James Socol [:jsocol, :james]

Reporter

Comment 13

•

15 years ago

(In reply to comment #12) > It seems this is a win. I'm resolving fixed. Great, thanks Tim! Does this also resolve the schema bug?

timellis

Assignee

Comment 14

•

15 years ago

It does. Do you want me to mysqldump out a schema from anywhere to help you verify that?

James Socol [:jsocol, :james]

Reporter

Comment 15

•

15 years ago

(In reply to comment #14) > It does. Do you want me to mysqldump out a schema from anywhere to help you > verify that? Sure, if you could. Attach it to the schema-fixing bug?

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: mozilla.org → mozilla.org Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Rebuild SUMO database cluster.

Categories

(mozilla.org Graveyard :: Server Operations, task)

Tracking

(Not tracked)

People

(Reporter: jsocol, Assigned: tellis)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Updated

Attachment

General

Description

File Name

Content Type