Closed
Bug 464458
Opened 17 years ago
Closed 17 years ago
EqualLogic member (eql01-ps01) crashed, iSCSI hosts affected
Categories
(mozilla.org Graveyard :: Server Operations, task)
mozilla.org Graveyard
Server Operations
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Gavin, Assigned: aravind)
References
()
Details
http://mxr.mozilla.org/mobile-browser/search?string=updateAutoComplete returns:
** Fatal: Search isn't available; please complain to the webmaster [cite: bad_dbdir]
most of the trees are dead atm with indications that the files aren't readable by the web server user.
Severity: major → blocker
Comment 2•17 years ago
|
||
Looks like any VMs with storage on the equalogic NAS are having problems :(
Assignee: server-ops → aravind
Summary: MXR for mobile-browser is broken → VM storage on NAS is down
Comment 3•17 years ago
|
||
Not just VMs... dm-webtools02 and dm-webtools04 both have iSCSI partitions on the equallogic NAS, and they're both suffering.
Comment 4•17 years ago
|
||
(In reply to comment #3)
> Not just VMs... dm-webtools02 and dm-webtools04 both have iSCSI partitions on
> the equallogic NAS, and they're both suffering.
... which means tinderbox and mxr are dead.
Comment 5•17 years ago
|
||
Grabbed from the event log:
802:722:eql01-ps01:SP:12-Nov-2008 07:06:30.400723:emm.c:1728:INFO:28.2.30:Contro
l modules have completed initializing, and failover is now operational.
799:1:eql01-ps01:psgd:12-Nov-2008 07:06:30.220002:psgd_group.cc:11166:INFO:18.2.
0:Group member eql01-ps01 now active in the group.
783:5:eql01-ps01:SP [secondary]:12-Nov-2008 07:06:28.110006:eqllog_mbuf_Q.c:974:
ERROR:2.4.0:Panic recovery from CPU0 with reason 'CPU1 (NP) timed out. Please
triage the crash dump for cause'.
782:0:eql01-ps01:QRQ [secondary]:12-Nov-2008 07:06:28.110006:qrq.c:408:INFO:9.2.
0:PS Series Array Firmware Version: Storage Array Firmware V3.3.1 (R68703)
782:4:eql01-ps01:SP [secondary]:12-Nov-2008 07:06:28.080005:ppool_nvram.c:194:ER
ROR:15.4.7:Saved function call stack, CPU 0
8049f130 8062604c 806000c8 8049d19c 8049d11c 804abc68 8049e1cc 804ad0a4
8075d52c 80494de4 00000000 00000000 00000000 00000000 00000000 00000000
779:3:eql01-ps01:SP [secondary]:12-Nov-2008 07:06:28.060004:ppool_nvram.c:188:ER
ROR:15.4.6:Saved CP0 registers, CPU 0
sr 0000fc01 badva c010e330 epc 8075d52c errorepc 813e2208
cause 80000000 errctl 00000000 cacheeri 30001f20 cacheerd 00000000
buserr 000000ffffffffe0 cacheerrdpa 0000008110000000
773:721:eql01-ps01:SP:12-Nov-2008 07:06:28.040722:cache_driver.cc:933:WARNING:28
.3.17:Active control module cache is now in write-through mode. Array performanc
e is degraded.
778:2:eql01-ps01:SP [secondary]:12-Nov-2008 07:06:28.030003:ppool_nvram.c:184:ER
ROR:15.4.5:Saved CPU registers, CPU 0
at ffffffff80d60000 v0 0000000000000000 v1 000000000000003c
a0 ffffffff807fbfc4 a1 000000000000fc01 a2 ffffffff80625e98 a3 ffffffff80d84598
t0 0000000000000000 t1 ffffffff81317320 t2 0000000000000040 t3 ffffffff80494da0
t4 0000000000000000 t5 0000000000000001 t6 000000007fffeab0 t7 0000000000000000
s0 0000000000000000 s1 ffffffffffff00fe s2 ffffffff807fbfc4 s3 0000000000000000
s4 ffffffffffffffff s5 ffffffffffffffff s6 000000000000001f s7 0000000038c3b080
t8 0000000000000000 t9 00000000300b4ba0 k0 ffffffff804ad00c k1 0000000000000000
gp ffffffff808ac070 sp 000000000000000e s8 0000000000000200 ra ffffffff8062604c
775:1:eql01-ps01:SP [secondary]:12-Nov-2008 07:06:28.010002:ppool_nvram.c:265:ER
ROR:15.4.1:NVRAM contains valid data. This is a PANIC RECOVERY due to a panic on
CPU0.
Summary: VM storage on NAS is down → EqualLogic member (eql01-ps01) crashed, iSCSI hosts affected
Comment 7•17 years ago
|
||
From emails -
http://10.253.0.254
-----------------------------------------
WARNING event from storage array eql01-ps01
subsystem: SP
event: 15.3.1
time: Wed Nov 12 07:05:54 2008
NVRAM contains valid data. This is a CONTROL MODULE FAILOVER.
-----------------------------------------
WARNING event from storage array eql01-ps01
subsystem: SP
event: 28.3.51
time: Wed Nov 12 07:05:54 2008
Warning health conditions currently exist.
Investigate these conditions before they affect array operation.
Control module was removed from array.
There are 1 outstanding health conditions. Investigate these conditions before they affect array operation.
-----------------------------------------
ERROR event from storage array eql01-ps01
subsystem: SP
event: 28.4.50
time: Wed Nov 12 07:05:54 2008
Control module in slot 0 is not functioning or not installed.
-----------------------------------------
WARNING event from storage array eql01-ps01
subsystem: SP
event: 28.3.51
time: Wed Nov 12 07:06:26 2008
Warning health conditions currently exist.
Investigate these conditions before they affect array operation.
Control modules are initializing. Control module failover cannot occur until the initialization completes.
There are 1 outstanding health conditions. Investigate these conditions before they affect array operation.
-----------------------------------------
ERROR event from storage array eql01-ps01
subsystem: SP [secondary]
event: 15.4.1
time: Wed Nov 12 07:06:28 2008
NVRAM contains valid data. This is a PANIC RECOVERY due to a panic on CPU0.
-----------------------------------------
Comment 8•17 years ago
|
||
(In reply to comment #2)
> Looks like any VMs with storage on the equalogic NAS are having problems :(
Which VMs are impacted?
Comment 9•17 years ago
|
||
I gave the list to bhearsum on IRC earlier. Don't have it handy at the moment.
Comment 10•17 years ago
|
||
I'm point-of-contact from RelEng for the fix deployment this evening.
| Assignee | ||
Comment 11•17 years ago
|
||
Upgraded the firmware on the controller, which should fix this problem.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•