Bug 936042 (t-w864-ix-092)

t-w864-ix-092 problem tracking

RESOLVED FIXED

Status

Release Engineering
Buildduty
P3
normal
RESOLVED FIXED
4 years ago
9 months ago

People

(Reporter: jhopkins, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [buildduty][buildslaves][capacity])

Attachments

(1 attachment)

(Reporter)

Description

4 years ago
burning jobs due to "can't clone scripts" error.  Disabled in slavealloc.
Depends on: 937270
Re-enabled because we're pretty sure this is rooted in an issue external to the machine that breaks DNS resolution until the machine reboots. bug 937279 has more.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
I've seen multiple instances of "Access Denied" spew during xpcshell runs contributing to timeouts on this slave. Disabled.
https://treeherder.mozilla.org/logviewer.html#?job_id=498896&repo=mozilla-aurora
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Comment 3

3 years ago
Re-imaged and re-enabled.
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago3 years ago
Resolution: --- → FIXED
Too many failures either unique to this slave, or predominantly this slave (10 of the 25 instances of bug 1075419 have been on this). Disabled, needs diagnostics.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Duplicate of this bug: 1123129
Duplicate of this bug: 1122765
Duplicate of this bug: 1122546

Updated

3 years ago
Depends on: 1126948
Reenabled so it can burn more jobs and create more invalid test-failure bugs.
Status: REOPENED → RESOLVED
Last Resolved: 3 years ago3 years ago
QA Contact: armenzg → bugspam.Callek
Resolution: --- → FIXED
Or not. Rebooted twice, it's not connecting to a master. Redisabled.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Something was amiss. The host was up, but I couldn't connect via ssh or VNC. 

I've kicked off another re-image and we'll see if it come back. I'll file a diagnostics bug if it doesn't come back in a functional state.
This machine is currently stuck setting the screen resolution: http://imgur.com/miqzfw4
disabled as last few jobs are not looking good and comment 11 suggests it needs love.

noticed this slave as slaveapi was trying to do something (shutdown?) with it before we rebooted slaveapi1 prod node.

Updated

3 years ago
Depends on: 1133164
This went rather well, wouldn't you say?
Status: REOPENED → RESOLVED
Last Resolved: 3 years ago3 years ago
Resolution: --- → FIXED
Seems to be quite orangetastic according to slave health, including xpcshell 120min timeouts (when it normally runs ~20min)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Updated

3 years ago
Depends on: 1171658
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1172763)
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1172990)

Updated

3 years ago
Depends on: 893716

Updated

3 years ago
No longer depends on: 893716
Reenabled with a new motherboard.
Status: REOPENED → RESOLVED
Last Resolved: 3 years ago3 years ago
Resolution: --- → FIXED
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1193750)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1195648)
Depends on: 1195785
Reenabled, we'll see.
Status: REOPENED → RESOLVED
Last Resolved: 3 years ago2 years ago
Resolution: --- → FIXED
Disabled. Just like the other Win8 slave that was reimaged this week, it can't make it through more than 1.5 jobs before disconnecting and requiring a reboot to send it back to work.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1204670)
Depends on: 1200180
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1205655)

Comment 25

2 years ago
hmm, the host is up and sshable. i am also able to log in to the host via IPMI with the releng password in the gpg file.

vans-MacBook-Pro:~ vle$ ping t-w864-ix-092.wintest.releng.scl3.mozilla.com
PING t-w864-ix-092.wintest.releng.scl3.mozilla.com (10.26.40.122): 56 data bytes
64 bytes from 10.26.40.122: icmp_seq=0 ttl=124 time=3.020 ms
64 bytes from 10.26.40.122: icmp_seq=1 ttl=124 time=4.563 ms
^C
--- t-w864-ix-092.wintest.releng.scl3.mozilla.com ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 3.020/3.792/4.563/0.771 ms
vans-MacBook-Pro:~ vle$ ssh !$
ssh t-w864-ix-092.wintest.releng.scl3.mozilla.com
The authenticity of host 't-w864-ix-092.wintest.releng.scl3.mozilla.com (10.26.40.122)' can't be established.
RSA key fingerprint is e3:01:f7:a3:a1:b6:17:d2:b4:ca:97:c5:3c:54:56:e1.
Are you sure you want to continue connecting (yes/no)?
(In reply to Van Le [:van] from comment #25)
> hmm, the host is up and sshable. i am also able to log in to the host via
> IPMI with the releng password in the gpg file.

It apparently didn't like you looking at it, though: 12 minutes after your comment it failed out of the job it was running, hasn't taken another, and won't reboot through slaverebooter.

Curious thing about https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?class=test&type=t-w864-ix&name=t-w864-ix-092 - shouldn't all those "Attempting SSH reboot...Failed. Filed IT bug..." lines have an IPMI attempt after the failed ssh attempt?

Comment 27

2 years ago
looks like it failed the reimage. after checking and finding the host sshable and ipmi reachable, i gave it another reimage to see if it was the image that was causing issues - it was an issue previously, unsure if Q fixed it.

i'll attach screen shot to this bug and the child bug in dcops queue for tracking as well.

Comment 28

2 years ago
Created attachment 8663032 [details]
Screen Shot 2015-09-18 at 10.26.07 AM.png
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1212525)
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → FIXED
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1214457)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1214814)
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → FIXED
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1215830)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → FIXED
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1217556)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Disabled while it takes another trip back to iX.
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → FIXED
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1224024)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → FIXED
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1225006)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → FIXED
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1227166)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → FIXED
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1228370)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → FIXED
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1230979)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Phil Ringnalda (:philor) from comment #26)
> Curious thing about
> https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.
> html?class=test&type=t-w864-ix&name=t-w864-ix-092 - shouldn't all those
> "Attempting SSH reboot...Failed. Filed IT bug..." lines have an IPMI attempt
> after the failed ssh attempt?

Still true, and...

(In reply to Phil Ringnalda (:philor) from comment #17)
> Reenabled with a new motherboard.

isn't there something that has to be done after a motherboard replacement to get IPMI reboots working again?
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1232565)
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → FIXED
Disabled since bug 1230979 is eventually planning on giving it a new disk, and it's now unrebootable but that open unreachable bug prevents us from getting another one as it fails to reboot like it always fails to reboot.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
From what I can see the hdd have been replaced and the slave take jobs.
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → FIXED
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1242319)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
What's the worst that can happen?
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → FIXED
Attempting SSH reboot...Failed.
Filed IT bug for reboot (bug 1253962)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.