Closed
Bug 959663
Opened 12 years ago
Closed 12 years ago
Troubleshoot t-w732-ix-115
Categories
(Infrastructure & Operations :: DCOps, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: armenzg, Unassigned)
References
Details
(Adding q and markco for this issue).
We have a Win7 machine that sometimes when it runs xperf unit tests it will crash and reboot.
After running it on staging I eventually managed to hit the issue.
This could have been happening all the way back from July ("Disconnecting a lot, dunno what's happening yet.")
Could you please determine if new cabling is needed? Or something else?
I saw this Kernel-Power issue right after a failed xperf run [1]
I also saw a "The device, \Device\Ide\iaStor0, did not respond within the timeout period." [2] before the unexpected reboot.
I see a SCSI cabling note in here:
http://social.technet.microsoft.com/Forums/windows/en-US/3356c10e-673f-4882-9e5d-b9d61bafce9f/the-device-deviceideiastor0-did-not-respond-within-the-timeout-period?forum=w7itprogeneral
There's even more info in here:
http://support.microsoft.com/kb/154690
The log ended in here:
13:05:02 INFO - JavaScript error: http://localhost/page_load_test/tp5n/etsy.com/www.etsy.com/category/geekery/videogame.html, line 326: Etsy is not defined
remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion.
]
[1]
Log Name: System
Source: Microsoft-Windows-Kernel-Power
Date: 1/13/2014 1:07:18 PM
Event ID: 41
Task Category: (63)
Level: Critical
Keywords: (2)
User: SYSTEM
Computer: T-W732-IX-115.releng.ad.mozilla.com
Description:
The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-Kernel-Power" Guid="{331C3B3A-2005-44C2-AC5E-77220C37D6B4}" />
<EventID>41</EventID>
<Version>2</Version>
<Level>1</Level>
<Task>63</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000002</Keywords>
<TimeCreated SystemTime="2014-01-13T21:07:18.073239600Z" />
<EventRecordID>35508</EventRecordID>
<Correlation />
<Execution ProcessID="4" ThreadID="8" />
<Channel>System</Channel>
<Computer>T-W732-IX-115.releng.ad.mozilla.com</Computer>
<Security UserID="S-1-5-18" />
</System>
<EventData>
<Data Name="BugcheckCode">0</Data>
<Data Name="BugcheckParameter1">0x0</Data>
<Data Name="BugcheckParameter2">0x0</Data>
<Data Name="BugcheckParameter3">0x0</Data>
<Data Name="BugcheckParameter4">0x0</Data>
<Data Name="SleepInProgress">false</Data>
<Data Name="PowerButtonTimestamp">0</Data>
</EventData>
</Event>
[2]
Log Name: System
Source: iaStor
Date: 1/13/2014 1:04:27 PM
Event ID: 9
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: T-W732-IX-115.releng.ad.mozilla.com
Description:
The device, \Device\Ide\iaStor0, did not respond within the timeout period.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="iaStor" />
<EventID Qualifiers="49156">9</EventID>
<Level>2</Level>
<Task>0</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2014-01-13T21:04:27.134935100Z" />
<EventRecordID>35500</EventRecordID>
<Channel>System</Channel>
<Computer>T-W732-IX-115.releng.ad.mozilla.com</Computer>
<Security />
</System>
<EventData>
<Data>\Device\Ide\iaStor0</Data>
<Binary>0F0028000100000000000000090004C011111111090004C0000000000200000067452301EFCDAB89010000000000CCCC0000B0CD00000000010000001E000000100000003C5177D71000000070F00FDF</Binary>
</EventData>
</Event>
Reporter | ||
Comment 1•12 years ago
|
||
FYI, I scheduled a disk check on start-up by mistake. It will be needing on-hands intervention if I understand correctly (unless it manages to reach the log-in page after the scan disk).
Updated•12 years ago
|
colo-trip: --- → scl3
Comment 2•12 years ago
|
||
:armen, these hosts have front loading drives and are 4 blade servers to a chassis. there are no SCSI cables and SATA doesnt require active termination. i would need to open up the chassis (and bring down the other 3 hosts) to see if there are any SATA cables being used at all.
the drive diagnostics came back negative, so i've reseated the blade server in its chassis and the drive in its bay. please run a few more tests and reopen this bug if issues persist. we can send this to iX and have them do a 48-hour burn in to see if there are any issues they can detect.
host is back online.
[vle@admin1a.private.scl3 ~]$ fping t-w732-ix-115.wintest.releng.scl3.mozilla.com
t-w732-ix-115.wintest.releng.scl3.mozilla.com is alive
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 3•12 years ago
|
||
Thanks for looking into this!
I will request a burn in if it persists.
Reporter | ||
Comment 4•12 years ago
|
||
Filed bug 961844.
Updated•11 years ago
|
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•