Closed Bug 464458 Opened 17 years ago Closed 17 years ago

EqualLogic member (eql01-ps01) crashed, iSCSI hosts affected

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Gavin, Assigned: aravind)

References

()

Details

http://mxr.mozilla.org/mobile-browser/search?string=updateAutoComplete returns: ** Fatal: Search isn't available; please complain to the webmaster [cite: bad_dbdir]
most of the trees are dead atm with indications that the files aren't readable by the web server user.
Severity: major → blocker
Looks like any VMs with storage on the equalogic NAS are having problems :(
Assignee: server-ops → aravind
Summary: MXR for mobile-browser is broken → VM storage on NAS is down
Not just VMs... dm-webtools02 and dm-webtools04 both have iSCSI partitions on the equallogic NAS, and they're both suffering.
(In reply to comment #3) > Not just VMs... dm-webtools02 and dm-webtools04 both have iSCSI partitions on > the equallogic NAS, and they're both suffering. ... which means tinderbox and mxr are dead.
Grabbed from the event log: 802:722:eql01-ps01:SP:12-Nov-2008 07:06:30.400723:emm.c:1728:INFO:28.2.30:Contro l modules have completed initializing, and failover is now operational. 799:1:eql01-ps01:psgd:12-Nov-2008 07:06:30.220002:psgd_group.cc:11166:INFO:18.2. 0:Group member eql01-ps01 now active in the group. 783:5:eql01-ps01:SP [secondary]:12-Nov-2008 07:06:28.110006:eqllog_mbuf_Q.c:974: ERROR:2.4.0:Panic recovery from CPU0 with reason 'CPU1 (NP) timed out. Please triage the crash dump for cause'. 782:0:eql01-ps01:QRQ [secondary]:12-Nov-2008 07:06:28.110006:qrq.c:408:INFO:9.2. 0:PS Series Array Firmware Version: Storage Array Firmware V3.3.1 (R68703) 782:4:eql01-ps01:SP [secondary]:12-Nov-2008 07:06:28.080005:ppool_nvram.c:194:ER ROR:15.4.7:Saved function call stack, CPU 0 8049f130 8062604c 806000c8 8049d19c 8049d11c 804abc68 8049e1cc 804ad0a4 8075d52c 80494de4 00000000 00000000 00000000 00000000 00000000 00000000 779:3:eql01-ps01:SP [secondary]:12-Nov-2008 07:06:28.060004:ppool_nvram.c:188:ER ROR:15.4.6:Saved CP0 registers, CPU 0 sr 0000fc01 badva c010e330 epc 8075d52c errorepc 813e2208 cause 80000000 errctl 00000000 cacheeri 30001f20 cacheerd 00000000 buserr 000000ffffffffe0 cacheerrdpa 0000008110000000 773:721:eql01-ps01:SP:12-Nov-2008 07:06:28.040722:cache_driver.cc:933:WARNING:28 .3.17:Active control module cache is now in write-through mode. Array performanc e is degraded. 778:2:eql01-ps01:SP [secondary]:12-Nov-2008 07:06:28.030003:ppool_nvram.c:184:ER ROR:15.4.5:Saved CPU registers, CPU 0 at ffffffff80d60000 v0 0000000000000000 v1 000000000000003c a0 ffffffff807fbfc4 a1 000000000000fc01 a2 ffffffff80625e98 a3 ffffffff80d84598 t0 0000000000000000 t1 ffffffff81317320 t2 0000000000000040 t3 ffffffff80494da0 t4 0000000000000000 t5 0000000000000001 t6 000000007fffeab0 t7 0000000000000000 s0 0000000000000000 s1 ffffffffffff00fe s2 ffffffff807fbfc4 s3 0000000000000000 s4 ffffffffffffffff s5 ffffffffffffffff s6 000000000000001f s7 0000000038c3b080 t8 0000000000000000 t9 00000000300b4ba0 k0 ffffffff804ad00c k1 0000000000000000 gp ffffffff808ac070 sp 000000000000000e s8 0000000000000200 ra ffffffff8062604c 775:1:eql01-ps01:SP [secondary]:12-Nov-2008 07:06:28.010002:ppool_nvram.c:265:ER ROR:15.4.1:NVRAM contains valid data. This is a PANIC RECOVERY due to a panic on CPU0.
Summary: VM storage on NAS is down → EqualLogic member (eql01-ps01) crashed, iSCSI hosts affected
From emails - http://10.253.0.254 ----------------------------------------- WARNING event from storage array eql01-ps01 subsystem: SP event: 15.3.1 time: Wed Nov 12 07:05:54 2008 NVRAM contains valid data. This is a CONTROL MODULE FAILOVER. ----------------------------------------- WARNING event from storage array eql01-ps01 subsystem: SP event: 28.3.51 time: Wed Nov 12 07:05:54 2008 Warning health conditions currently exist. Investigate these conditions before they affect array operation. Control module was removed from array. There are 1 outstanding health conditions. Investigate these conditions before they affect array operation. ----------------------------------------- ERROR event from storage array eql01-ps01 subsystem: SP event: 28.4.50 time: Wed Nov 12 07:05:54 2008 Control module in slot 0 is not functioning or not installed. ----------------------------------------- WARNING event from storage array eql01-ps01 subsystem: SP event: 28.3.51 time: Wed Nov 12 07:06:26 2008 Warning health conditions currently exist. Investigate these conditions before they affect array operation. Control modules are initializing. Control module failover cannot occur until the initialization completes. There are 1 outstanding health conditions. Investigate these conditions before they affect array operation. ----------------------------------------- ERROR event from storage array eql01-ps01 subsystem: SP [secondary] event: 15.4.1 time: Wed Nov 12 07:06:28 2008 NVRAM contains valid data. This is a PANIC RECOVERY due to a panic on CPU0. -----------------------------------------
(In reply to comment #2) > Looks like any VMs with storage on the equalogic NAS are having problems :( Which VMs are impacted?
I gave the list to bhearsum on IRC earlier. Don't have it handy at the moment.
I'm point-of-contact from RelEng for the fix deployment this evening.
Upgraded the firmware on the controller, which should fix this problem.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.