Closed
Bug 891128
Opened 11 years ago
Closed 11 years ago
Multiple services are offline in SCL3
Categories
(Infrastructure & Operations Graveyard :: NetOps: Other, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: fox2mike, Unassigned)
References
Details
(Whiteboard: [reit-ops][reit-b2g])
Still ongoing, working to resolve
Reporter | ||
Updated•11 years ago
|
Assignee: server-ops → shyam
Group: infra → mozilla-corporation-confidential
Reporter | ||
Updated•11 years ago
|
Summary: Multiple services outage in SCL3 → Multiple services are offline in SCL3
Reporter | ||
Comment 1•11 years ago
|
||
List of affected services : 1) Bugzilla (should be back online now) 2) hg.mozilla.org 3) git.mozilla.org
Group: mozilla-corporation-confidential
Updated•11 years ago
|
Whiteboard: [reit-ops][reit-b2g]
Comment 2•11 years ago
|
||
Done on bugzilla4 and the backups server and bugzilla1.db.phx1: slave stop; change master to master_host='bugzilla2.db.scl3.mozilla.com', master_log_pos=921472505, master_log_file='bugzilla2-bin.000322'; slave start; needs to be done on bugzilla3 when it comes up, too. (bugzilla1 is already slaving bugzilla2) bugzilla1.db.phx1 can't replicate because of netflows: [root@bugzilla1.db.phx1 ~]# nc -vz bugzilla2.db.scl3.mozilla.com 3306 nc: connect to bugzilla2.db.scl3.mozilla.com port 3306 (tcp) failed: Connection timed out
Whiteboard: [reit-ops][reit-b2g]
Updated•11 years ago
|
Whiteboard: [reit-ops][reit-b2g]
Reporter | ||
Updated•11 years ago
|
Assignee: shyam → ravi
Severity: blocker → normal
Component: Server Operations → NetOps: Other
Product: mozilla.org → Infrastructure & Operations
QA Contact: shyam → ravi
Comment 3•11 years ago
|
||
Switched back to bugzilla1.db.scl3 as the master; if this write succeeds then everything is OK again.
Comment 4•11 years ago
|
||
trees closed: 14:47PT trees opened: 18:16PT The Firefox23.0b4 release was impacted, those builds have been revived, and are now in progress... details of cleanup in bug#891165. Like before, lets keep this bug open for root cause/postmortem.
Comment 5•11 years ago
|
||
We have AJTAC and BNG case 2013-0708-1012 open with Juniper and a call this morning to sync up on the issue. We believe this may be a similar issue to bug 826609. We believe we were able to collect full debug output and logs to help AJTAC identify a root cause.
Status: NEW → ASSIGNED
Updated•11 years ago
|
QA Contact: ravi → adam
Updated•11 years ago
|
Assignee: ravi → network-operations
Comment 6•11 years ago
|
||
Juniper's case remains open, however, as they have not been able to track down a root cause of the issue for the past year, we are electing to remove the technology. the depending bug is the tracker for that work. Previously in the netops roadmap, we were going to install this device in phx1 as well, however, that is no longer on the table due to instability we've experienced in scl3, which would is a similar layout.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•2 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•