Closed
Bug 890035
(t-w732-ix-115)
Opened 12 years ago
Closed 8 years ago
t-w732-ix-115 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Unassigned)
References
Details
(Whiteboard: [buildduty][xperf_q3_disabled] status-in-comment-16)
Disconnecting a lot, dunno what's happening yet. Disabled in slavealloc for now.
Reporter | ||
Comment 1•12 years ago
|
||
Trying a reimage.
Comment 2•12 years ago
|
||
Reimage done, back in production
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 3•12 years ago
|
||
Showing the same reftest failures that were narrowed down to a slave issue on Friday. Given that this was just reimaged, I wonder if it's some sort of configuration issue.
https://tbpl.mozilla.org/php/getParsedLog.php?id=25262068&tree=Mozilla-Aurora
I'll also point out that the slave health page for this slave looks pretty awful overall.
https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?name=t-w732-ix-115
Disabled in slavealloc.
Comment 4•12 years ago
|
||
RyanVM, can you please REOPEN when disabling it? Thanks!
It seems that it did make some Web GL tests fails.
Look at attachment 777760 [details] in bug 873566 to see that the color-depth was messed up.
Assignee | ||
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Reporter | ||
Comment 5•12 years ago
|
||
Hardware diagnostics passed. Back in production after a reimage, let's hope that helped.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 6•12 years ago
|
||
Can't run WebGL tests. Disabled in slavealloc.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•12 years ago
|
Comment 7•12 years ago
|
||
Rebooted into production.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 8•12 years ago
|
||
Zero successful xperf runs since July 14th (e.g. https://tbpl.mozilla.org/php/getParsedLog.php?id=27984306&tree=Fx-Team).
Disabled in slavealloc.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•12 years ago
|
Whiteboard: [buildduty] → [buildduty][xperf_q3_disabled]
Comment 10•12 years ago
|
||
Re-imaging.
It is having issues with xperf.
Comment 11•12 years ago
|
||
philor, does this ring a bell?
=1=
18:41:33 INFO - JavaScript warning: http://mochi.test:8888/tests/dom/imptests/html/webgl/common.js, line 3: WebGL: Error during ANGLE OpenGL ES initialization
=2=
http://cl.ly/SoQN
=3=
I don't see anything in here. Maybe the wallpaper?
http://cl.ly/SoQV
=4=
17:16:36 INFO - JavaScript warning: http://mochi.test:8888/tests/content/canvas/test/webgl/non-conf-tests/webgl-util.js, line 49: WebGL: Error during ANGLE OpenGL ES initialization
Comment 12•12 years ago
|
||
Never mind.
I think it is a graphical issue:
http://cl.ly/Sofw
Comment 13•11 years ago
|
||
I will file an IT but to chase this issue (which does not happen always).
I saw this Kernel-Power issue right after a failed xperf run [1]
I also saw a "The device, \Device\Ide\iaStor0, did not respond within the timeout period." [2] before the unexpected reboot.
I see a SCSI cabling note in here:
http://social.technet.microsoft.com/Forums/windows/en-US/3356c10e-673f-4882-9e5d-b9d61bafce9f/the-device-deviceideiastor0-did-not-respond-within-the-timeout-period?forum=w7itprogeneral
There's even more info in here:
http://support.microsoft.com/kb/154690
The log ended in here:
13:05:02 INFO - JavaScript error: http://localhost/page_load_test/tp5n/etsy.com/www.etsy.com/category/geekery/videogame.html, line 326: Etsy is not defined
remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion.
]
[1]
Log Name: System
Source: Microsoft-Windows-Kernel-Power
Date: 1/13/2014 1:07:18 PM
Event ID: 41
Task Category: (63)
Level: Critical
Keywords: (2)
User: SYSTEM
Computer: T-W732-IX-115.releng.ad.mozilla.com
Description:
The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-Kernel-Power" Guid="{331C3B3A-2005-44C2-AC5E-77220C37D6B4}" />
<EventID>41</EventID>
<Version>2</Version>
<Level>1</Level>
<Task>63</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000002</Keywords>
<TimeCreated SystemTime="2014-01-13T21:07:18.073239600Z" />
<EventRecordID>35508</EventRecordID>
<Correlation />
<Execution ProcessID="4" ThreadID="8" />
<Channel>System</Channel>
<Computer>T-W732-IX-115.releng.ad.mozilla.com</Computer>
<Security UserID="S-1-5-18" />
</System>
<EventData>
<Data Name="BugcheckCode">0</Data>
<Data Name="BugcheckParameter1">0x0</Data>
<Data Name="BugcheckParameter2">0x0</Data>
<Data Name="BugcheckParameter3">0x0</Data>
<Data Name="BugcheckParameter4">0x0</Data>
<Data Name="SleepInProgress">false</Data>
<Data Name="PowerButtonTimestamp">0</Data>
</EventData>
</Event>
[2]
Log Name: System
Source: iaStor
Date: 1/13/2014 1:04:27 PM
Event ID: 9
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: T-W732-IX-115.releng.ad.mozilla.com
Description:
The device, \Device\Ide\iaStor0, did not respond within the timeout period.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="iaStor" />
<EventID Qualifiers="49156">9</EventID>
<Level>2</Level>
<Task>0</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2014-01-13T21:04:27.134935100Z" />
<EventRecordID>35500</EventRecordID>
<Channel>System</Channel>
<Computer>T-W732-IX-115.releng.ad.mozilla.com</Computer>
<Security />
</System>
<EventData>
<Data>\Device\Ide\iaStor0</Data>
<Binary>0F0028000100000000000000090004C011111111090004C0000000000200000067452301EFCDAB89010000000000CCCC0000B0CD00000000010000001E000000100000003C5177D71000000070F00FDF</Binary>
</EventData>
</Event>
Updated•11 years ago
|
Assignee: armenzg → nobody
Reporter | ||
Comment 14•11 years ago
|
||
Armen, this machine is still locked to your dev master - what's up with it?
Flags: needinfo?(armenzg)
Comment 15•11 years ago
|
||
I was hoping for DCOps to be able to fix the machine and be put back into production.
It seems I will have to run more tests and request servicing by iX if it happens again.
Grabbing it.
Assignee: nobody → armenzg
Flags: needinfo?(armenzg)
Comment 16•11 years ago
|
||
Summary
#######
tl;dr requested burn-in to iX
2013-07-03 - disconnecting a lot; re-image requested
2013-07-15 - reftest failures & lots of debugging in bug 873566 with other machines
2013-09-05 - request for diagnostics; nothing found and re-imaged
2013-09-06 - it can't run webgl tests
2013-09-12 - rebooted into production after comment 42 in bug 873566 [1]
2013-09-17 - first mention of xperf failures
2013-11-01 - put into staging
2013-12-02 - re-imaging requested
2013-12-05 - graphical setup might be related
2014-01-14 - diagnostics requested; kernel-power issues spotted - may be cabling issues
2014-01-20 - file request for burn-in
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=873566#c42
Depends on: 961844
Whiteboard: [buildduty][xperf_q3_disabled] → [buildduty][xperf_q3_disabled] status-in-comment-16
Comment 17•11 years ago
|
||
Slave has returned from burn-in.
Watching builds running here: http://buildbot-master109.srv.releng.scl3.mozilla.com:8201/buildslaves/t-w732-ix-115
Comment 18•11 years ago
|
||
Back in production. One orange job and one green so far.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Comment 19•11 years ago
|
||
This the one that we should keep an eye on how it fairs with xperf jobs.
Updated•11 years ago
|
Assignee: armenzg → nobody
QA Contact: armenzg → bugspam.Callek
Updated•9 years ago
|
Comment 20•9 years ago
|
||
Attempting SSH reboot...Failed.
Attempting IPMI reboot...Failed.
Filed IT bug for reboot (bug 1314976)
Comment 21•9 years ago
|
||
Back online and taking jobs.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 9 years ago
Resolution: --- → FIXED
Comment 22•8 years ago
|
||
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1373696)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•8 years ago
|
Status: REOPENED → RESOLVED
Closed: 9 years ago → 8 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•