Last Comment Bug 464458 - EqualLogic member (eql01-ps01) crashed, iSCSI hosts affected
: EqualLogic member (eql01-ps01) crashed, iSCSI hosts affected
Status: RESOLVED FIXED
:
Product: mozilla.org Graveyard
Classification: Graveyard
Component: Server Operations (show other bugs)
: other
: All All
: -- blocker (vote)
: ---
Assigned To: Aravind Gottipati [:aravind]
: matthew zeier [:mrz]
:
Mentors:
http://mxr.mozilla.org/mobile-browser...
: 464455 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-12 07:34 PST by :Gavin Sharp [email: gavin@gavinsharp.com]
Modified: 2015-03-12 08:17 PDT (History)
12 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments

Description :Gavin Sharp [email: gavin@gavinsharp.com] 2008-11-12 07:34:37 PST
http://mxr.mozilla.org/mobile-browser/search?string=updateAutoComplete returns:

** Fatal: Search isn't available; please complain to the webmaster [cite: bad_dbdir]
Comment 1 timeless 2008-11-12 07:47:37 PST
most of the trees are dead atm with indications that the files aren't readable by the web server user.
Comment 2 Dave Miller [:justdave] (justdave@bugzilla.org) 2008-11-12 07:56:21 PST
Looks like any VMs with storage on the equalogic NAS are having problems :(
Comment 3 Reed Loden [:reed] (use needinfo?) 2008-11-12 07:57:27 PST
Not just VMs... dm-webtools02 and dm-webtools04 both have iSCSI partitions on the equallogic NAS, and they're both suffering.
Comment 4 Reed Loden [:reed] (use needinfo?) 2008-11-12 08:01:07 PST
(In reply to comment #3)
> Not just VMs... dm-webtools02 and dm-webtools04 both have iSCSI partitions on
> the equallogic NAS, and they're both suffering.

... which means tinderbox and mxr are dead.
Comment 5 matthew zeier [:mrz] 2008-11-12 08:03:04 PST
Grabbed from the event log:

802:722:eql01-ps01:SP:12-Nov-2008 07:06:30.400723:emm.c:1728:INFO:28.2.30:Contro
l modules have completed initializing, and failover is now operational.

799:1:eql01-ps01:psgd:12-Nov-2008 07:06:30.220002:psgd_group.cc:11166:INFO:18.2.
0:Group member eql01-ps01 now active in the group.

783:5:eql01-ps01:SP [secondary]:12-Nov-2008 07:06:28.110006:eqllog_mbuf_Q.c:974:
ERROR:2.4.0:Panic recovery from CPU0 with reason 'CPU1 (NP) timed out.   Please 
triage the crash dump for cause'.

782:0:eql01-ps01:QRQ [secondary]:12-Nov-2008 07:06:28.110006:qrq.c:408:INFO:9.2.
0:PS Series Array Firmware Version: Storage Array Firmware V3.3.1 (R68703)

782:4:eql01-ps01:SP [secondary]:12-Nov-2008 07:06:28.080005:ppool_nvram.c:194:ER
ROR:15.4.7:Saved function call stack, CPU 0
  8049f130 8062604c 806000c8 8049d19c 8049d11c 804abc68 8049e1cc 804ad0a4
  8075d52c 80494de4 00000000 00000000 00000000 00000000 00000000 00000000

779:3:eql01-ps01:SP [secondary]:12-Nov-2008 07:06:28.060004:ppool_nvram.c:188:ER
ROR:15.4.6:Saved CP0 registers, CPU 0
       sr 0000fc01    badva c010e330      epc 8075d52c errorepc 813e2208
    cause 80000000   errctl 00000000 cacheeri 30001f20 cacheerd 00000000
   buserr 000000ffffffffe0 cacheerrdpa 0000008110000000

773:721:eql01-ps01:SP:12-Nov-2008 07:06:28.040722:cache_driver.cc:933:WARNING:28
.3.17:Active control module cache is now in write-through mode. Array performanc
e is degraded.

778:2:eql01-ps01:SP [secondary]:12-Nov-2008 07:06:28.030003:ppool_nvram.c:184:ER
ROR:15.4.5:Saved CPU registers, CPU 0
                     at ffffffff80d60000 v0 0000000000000000 v1 000000000000003c

 a0 ffffffff807fbfc4 a1 000000000000fc01 a2 ffffffff80625e98 a3 ffffffff80d84598

 t0 0000000000000000 t1 ffffffff81317320 t2 0000000000000040 t3 ffffffff80494da0

 t4 0000000000000000 t5 0000000000000001 t6 000000007fffeab0 t7 0000000000000000

 s0 0000000000000000 s1 ffffffffffff00fe s2 ffffffff807fbfc4 s3 0000000000000000

 s4 ffffffffffffffff s5 ffffffffffffffff s6 000000000000001f s7 0000000038c3b080

 t8 0000000000000000 t9 00000000300b4ba0 k0 ffffffff804ad00c k1 0000000000000000

 gp ffffffff808ac070 sp 000000000000000e s8 0000000000000200 ra ffffffff8062604c
                                      

775:1:eql01-ps01:SP [secondary]:12-Nov-2008 07:06:28.010002:ppool_nvram.c:265:ER
ROR:15.4.1:NVRAM contains valid data. This is a PANIC RECOVERY due to a panic on
 CPU0.
Comment 6 Ben Hearsum (:bhearsum) 2008-11-12 08:16:03 PST
*** Bug 464455 has been marked as a duplicate of this bug. ***
Comment 7 matthew zeier [:mrz] 2008-11-12 08:19:45 PST
From emails - 

http://10.253.0.254

-----------------------------------------
WARNING event from storage array eql01-ps01
subsystem: SP
    event: 15.3.1
     time: Wed Nov 12 07:05:54 2008

NVRAM contains valid data. This is a CONTROL MODULE FAILOVER.
-----------------------------------------
WARNING event from storage array eql01-ps01
subsystem: SP
    event: 28.3.51
     time: Wed Nov 12 07:05:54 2008

Warning health conditions currently exist.
	Investigate these conditions before they affect array operation.
	Control module was removed from array.
	There are 1 outstanding health conditions.  Investigate these conditions before they affect array operation.
-----------------------------------------
ERROR event from storage array eql01-ps01
subsystem: SP
    event: 28.4.50
     time: Wed Nov 12 07:05:54 2008

Control module in slot 0 is not functioning or not installed.
-----------------------------------------
WARNING event from storage array eql01-ps01
subsystem: SP
    event: 28.3.51
     time: Wed Nov 12 07:06:26 2008

Warning health conditions currently exist.
	Investigate these conditions before they affect array operation.
	Control modules are initializing. Control module failover cannot occur until the initialization completes.
	There are 1 outstanding health conditions.  Investigate these conditions before they affect array operation.
-----------------------------------------
ERROR event from storage array eql01-ps01
subsystem: SP [secondary]
    event: 15.4.1
     time: Wed Nov 12 07:06:28 2008

NVRAM contains valid data. This is a PANIC RECOVERY due to a panic on CPU0.
-----------------------------------------
Comment 8 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2008-11-12 08:22:16 PST
(In reply to comment #2)
> Looks like any VMs with storage on the equalogic NAS are having problems :(

Which VMs are impacted?
Comment 9 Dave Miller [:justdave] (justdave@bugzilla.org) 2008-11-12 12:33:59 PST
I gave the list to bhearsum on IRC earlier.  Don't have it handy at the moment.
Comment 10 Nick Thomas [:nthomas] 2008-11-12 15:12:09 PST
I'm point-of-contact from RelEng for the fix deployment this evening.
Comment 11 Aravind Gottipati [:aravind] 2008-11-12 20:46:28 PST
Upgraded the firmware on the controller, which should fix this problem.

Note You need to log in before you can comment on or make changes to this bug.