Closed Bug 795157 Opened 12 years ago Closed 12 years ago

Power Outage in MTV1

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mburns, Assigned: mburns)

References

Details

(Whiteboard: [buildduty][outage][treeclosure])

~15:34 Erin Little emailed alerting of a power outage in Mountain View Office
[15:26:04] <arr> mburns: we're seeing a lot of alerts that indicate connectoin loss between scl1 and mtv1
Severity: normal → major
Latest word is that the outage will last for 2 more hours.

The AC in the IDF on 2nd floor appears to have died in the process. rbryce went to MTV1 to help triage and resolve.
All of df301 is dark which includes all core networking and some tegras.
Group: mozilla-corporation-confidential
[16:59:12] <@rbryce> lstest update from WPR--  power is still on  the 2nd floor but A/C is off. Doors open and the building maint just brough a mobile unit
[16:59:27] <@rbryce> Floor 3 is still powerless,
Why does this need to be Moco conf?

We really need to start not closing bugs at the drop of a hat; whilst only seemingly minor, it's things like this that cause unnecessary rifts between employees and non-employee contributors...
(In reply to Ed Morley [:edmorley UTC+1] from comment #4)
> Why does this need to be Moco conf?
> 
> We really need to start not closing bugs at the drop of a hat; whilst only
> seemingly minor, it's things like this that cause unnecessary rifts between
> employees and non-employee contributors...

(Given that this bug is being linked on IRC and non-employees are understandably frustrated at not being able to access it, to get an ETA on the tree reopening etc)
(In reply to Ed Morley [:edmorley UTC+1] from comment #4)
> We really need to start not closing bugs at the drop of a hat; whilst only
> seemingly minor, it's things like this that cause unnecessary rifts between
> employees and non-employee contributors...

We can talk offline about this if you'd like, but a lot of IT related bugs are necessarily closed by default.  Call it habit.  I will open this one up.
Group: mozilla-corporation-confidential
(In reply to Corey Shields [:cshields] from comment #6)
> We can talk offline about this if you'd like, but a lot of IT related bugs
> are necessarily closed by default.  Call it habit.  I will open this one up.

Thank you :-)

[It was more that (a) this was closed a few comments in, rather than by default; and (b) this affected people outside of IT, due to the tree closure.]
trees closed ~3:30pm PDT
Whiteboard: [buildduty][outage][treeclosure]
Erin Little reports power has been restored.
Status: NEW → ASSIGNED
The outage was caused by construction on a different floor of the building. Servers are coming back up now.
[18:28:37] <@ravi> so far all the netops stuff is looking 5x5
[18:32:53] <@justdave> and ringring is up

...
[18:38:22] <nagios-scl1> kvm2.build.mtv1 is UP: PING OK
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Oops, should not have closed this out.

Note: the Tree is still closed. IT is working on bringing everything back online.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
nagios is showing green for releng services, releng is verifying.

We still don't have building AC but should have temp cooling hooked up in the machine room in 30-35 minutes (we already have spot coolers in the office areas).  Maintinance should be on the roof to fix the building AC sometime tonight or tomorrow morning.
Trees reopened at 8:07pm PT
We now have sufficient (yet temporary) AC cooling in the mtv1 server room.  Everything is back to operational now and I'm closing this out.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
We also can confirm that Haxxor and 2nd floor data room chillers and equipment are powered on and operating as expected.
Blocks: 796012
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.