Closed
Bug 890035
(t-w732-ix-115)
Opened 11 years ago
Closed 7 years ago
t-w732-ix-115 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Unassigned)
References
Details
(Whiteboard: [buildduty][xperf_q3_disabled] status-in-comment-16)
Disconnecting a lot, dunno what's happening yet. Disabled in slavealloc for now.
Reporter | ||
Comment 1•11 years ago
|
||
Trying a reimage.
Comment 2•11 years ago
|
||
Reimage done, back in production
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 3•11 years ago
|
||
Showing the same reftest failures that were narrowed down to a slave issue on Friday. Given that this was just reimaged, I wonder if it's some sort of configuration issue. https://tbpl.mozilla.org/php/getParsedLog.php?id=25262068&tree=Mozilla-Aurora I'll also point out that the slave health page for this slave looks pretty awful overall. https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?name=t-w732-ix-115 Disabled in slavealloc.
Comment 4•11 years ago
|
||
RyanVM, can you please REOPEN when disabling it? Thanks! It seems that it did make some Web GL tests fails. Look at attachment 777760 [details] in bug 873566 to see that the color-depth was messed up.
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Reporter | ||
Comment 5•11 years ago
|
||
Hardware diagnostics passed. Back in production after a reimage, let's hope that helped.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Comment 6•11 years ago
|
||
Can't run WebGL tests. Disabled in slavealloc.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•11 years ago
|
Comment 7•11 years ago
|
||
Rebooted into production.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Comment 8•11 years ago
|
||
Zero successful xperf runs since July 14th (e.g. https://tbpl.mozilla.org/php/getParsedLog.php?id=27984306&tree=Fx-Team). Disabled in slavealloc.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•11 years ago
|
Whiteboard: [buildduty] → [buildduty][xperf_q3_disabled]
Comment 10•11 years ago
|
||
Re-imaging. It is having issues with xperf.
Comment 11•11 years ago
|
||
philor, does this ring a bell? =1= 18:41:33 INFO - JavaScript warning: http://mochi.test:8888/tests/dom/imptests/html/webgl/common.js, line 3: WebGL: Error during ANGLE OpenGL ES initialization =2= http://cl.ly/SoQN =3= I don't see anything in here. Maybe the wallpaper? http://cl.ly/SoQV =4= 17:16:36 INFO - JavaScript warning: http://mochi.test:8888/tests/content/canvas/test/webgl/non-conf-tests/webgl-util.js, line 49: WebGL: Error during ANGLE OpenGL ES initialization
Comment 12•11 years ago
|
||
Never mind. I think it is a graphical issue: http://cl.ly/Sofw
Comment 13•10 years ago
|
||
I will file an IT but to chase this issue (which does not happen always). I saw this Kernel-Power issue right after a failed xperf run [1] I also saw a "The device, \Device\Ide\iaStor0, did not respond within the timeout period." [2] before the unexpected reboot. I see a SCSI cabling note in here: http://social.technet.microsoft.com/Forums/windows/en-US/3356c10e-673f-4882-9e5d-b9d61bafce9f/the-device-deviceideiastor0-did-not-respond-within-the-timeout-period?forum=w7itprogeneral There's even more info in here: http://support.microsoft.com/kb/154690 The log ended in here: 13:05:02 INFO - JavaScript error: http://localhost/page_load_test/tp5n/etsy.com/www.etsy.com/category/geekery/videogame.html, line 326: Etsy is not defined remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion. ] [1] Log Name: System Source: Microsoft-Windows-Kernel-Power Date: 1/13/2014 1:07:18 PM Event ID: 41 Task Category: (63) Level: Critical Keywords: (2) User: SYSTEM Computer: T-W732-IX-115.releng.ad.mozilla.com Description: The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly. Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="Microsoft-Windows-Kernel-Power" Guid="{331C3B3A-2005-44C2-AC5E-77220C37D6B4}" /> <EventID>41</EventID> <Version>2</Version> <Level>1</Level> <Task>63</Task> <Opcode>0</Opcode> <Keywords>0x8000000000000002</Keywords> <TimeCreated SystemTime="2014-01-13T21:07:18.073239600Z" /> <EventRecordID>35508</EventRecordID> <Correlation /> <Execution ProcessID="4" ThreadID="8" /> <Channel>System</Channel> <Computer>T-W732-IX-115.releng.ad.mozilla.com</Computer> <Security UserID="S-1-5-18" /> </System> <EventData> <Data Name="BugcheckCode">0</Data> <Data Name="BugcheckParameter1">0x0</Data> <Data Name="BugcheckParameter2">0x0</Data> <Data Name="BugcheckParameter3">0x0</Data> <Data Name="BugcheckParameter4">0x0</Data> <Data Name="SleepInProgress">false</Data> <Data Name="PowerButtonTimestamp">0</Data> </EventData> </Event> [2] Log Name: System Source: iaStor Date: 1/13/2014 1:04:27 PM Event ID: 9 Task Category: None Level: Error Keywords: Classic User: N/A Computer: T-W732-IX-115.releng.ad.mozilla.com Description: The device, \Device\Ide\iaStor0, did not respond within the timeout period. Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="iaStor" /> <EventID Qualifiers="49156">9</EventID> <Level>2</Level> <Task>0</Task> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="2014-01-13T21:04:27.134935100Z" /> <EventRecordID>35500</EventRecordID> <Channel>System</Channel> <Computer>T-W732-IX-115.releng.ad.mozilla.com</Computer> <Security /> </System> <EventData> <Data>\Device\Ide\iaStor0</Data> <Binary>0F0028000100000000000000090004C011111111090004C0000000000200000067452301EFCDAB89010000000000CCCC0000B0CD00000000010000001E000000100000003C5177D71000000070F00FDF</Binary> </EventData> </Event>
Updated•10 years ago
|
Assignee: armenzg → nobody
Reporter | ||
Comment 14•10 years ago
|
||
Armen, this machine is still locked to your dev master - what's up with it?
Flags: needinfo?(armenzg)
Comment 15•10 years ago
|
||
I was hoping for DCOps to be able to fix the machine and be put back into production. It seems I will have to run more tests and request servicing by iX if it happens again. Grabbing it.
Assignee: nobody → armenzg
Flags: needinfo?(armenzg)
Comment 16•10 years ago
|
||
Summary ####### tl;dr requested burn-in to iX 2013-07-03 - disconnecting a lot; re-image requested 2013-07-15 - reftest failures & lots of debugging in bug 873566 with other machines 2013-09-05 - request for diagnostics; nothing found and re-imaged 2013-09-06 - it can't run webgl tests 2013-09-12 - rebooted into production after comment 42 in bug 873566 [1] 2013-09-17 - first mention of xperf failures 2013-11-01 - put into staging 2013-12-02 - re-imaging requested 2013-12-05 - graphical setup might be related 2014-01-14 - diagnostics requested; kernel-power issues spotted - may be cabling issues 2014-01-20 - file request for burn-in [1] https://bugzilla.mozilla.org/show_bug.cgi?id=873566#c42
Depends on: 961844
Whiteboard: [buildduty][xperf_q3_disabled] → [buildduty][xperf_q3_disabled] status-in-comment-16
Comment 17•10 years ago
|
||
Slave has returned from burn-in. Watching builds running here: http://buildbot-master109.srv.releng.scl3.mozilla.com:8201/buildslaves/t-w732-ix-115
Comment 18•10 years ago
|
||
Back in production. One orange job and one green so far.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 10 years ago
Resolution: --- → FIXED
Comment 19•10 years ago
|
||
This the one that we should keep an eye on how it fairs with xperf jobs.
Updated•10 years ago
|
Assignee: armenzg → nobody
QA Contact: armenzg → bugspam.Callek
Updated•8 years ago
|
Comment 20•8 years ago
|
||
Attempting SSH reboot...Failed. Attempting IPMI reboot...Failed. Filed IT bug for reboot (bug 1314976)
Comment 21•8 years ago
|
||
Back online and taking jobs.
Status: REOPENED → RESOLVED
Closed: 10 years ago → 8 years ago
Resolution: --- → FIXED
Comment 22•7 years ago
|
||
Attempting SSH reboot...Failed. Filed IT bug for reboot (bug 1373696)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•7 years ago
|
Status: REOPENED → RESOLVED
Closed: 8 years ago → 7 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•