Closed Bug 1195785 Opened 9 years ago Closed 9 years ago

Please run diagnostics on t-w864-ix-092

Categories

(Infrastructure & Operations :: DCOps, task)

x86_64
Windows 8
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aselagea, Assigned: van)

References

Details

(Whiteboard: #XRO-337-12891)

Attachments

(1 file)

This slave has no internet connection. I only managed to connect to it via KVM console. Also, it still runs at 1024x768.
running diagnostics
colo-trip: --- → scl3
Whiteboard: running diagnostics
this host was at iX for a few months for burn in tests. i reran diags and it looks like the hdd is taking 7hrs+ to complete. opened #XRO-337-12891 with iX to get a drive RMA to see if it resolves the issue.
Whiteboard: running diagnostics → #XRO-337-12891
Assignee: server-ops-dcops → vle
i am unable to image this host. please see attached error message. is there any config issues with this host on the back end?
QA Contact: jbarnell
amy, im getting an error message while trying to image. ive ran diags and everything looks ok. who is my best PoC to look at this issue?
Flags: needinfo?(arich)
Yes, the imaging server currently not functional.
Depends on: 1200180
Flags: needinfo?(arich) → needinfo?(q)
looks like imaging is fixed? ive reimaged the host. please let me know of any issues.

vans-MacBook-Pro:~ vle$ fping t-w864-ix-092.wintest.releng.scl3.mozilla.com
t-w864-ix-092.wintest.releng.scl3.mozilla.com is alive
vans-MacBook-Pro:~ vle$ ssh !$
ssh t-w864-ix-092.wintest.releng.scl3.mozilla.com
The authenticity of host 't-w864-ix-092.wintest.releng.scl3.mozilla.com (10.26.40.122)' can't be established.
RSA key fingerprint is e3:01:f7:a3:a1:b6:17:d2:b4:ca:97:c5:3c:54:56:e1.
Are you sure you want to continue connecting (yes/no)?
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Looks like imaging maybe isn't fixed: this slave and t-w864-ix-043 were reimaged this week, and they are both doing the same thing: one successful job, one where they disappear in the middle of running the second job, nothing until a reboot, maybe another 1.5 jobs, then settling into doing half a job, disappearing, requiring a reboot to do another half a job.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Looking into the issue. Would like to know more about what "disappearing" means in this case
Flags: needinfo?(q)
Looking into the issue. Would like to know more about what "disappearing" means in this case
Disappearing from the sight of the master, "[Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion." It can mean that the slave spontaneously rebooted, or shut itself off, or just stopped responding in any way, or probably lots of other things I've never heard about that leave no trace to an outside observer other than "disappeared off the face of the earth."
van: did we get a new hard drive on this machine in between comments 2 and 3? I can't tell from the update here. I'm wondering if the issues philor is seeing are related to this specific host or if it's a larger problem with the automated install and management (or something else).
Flags: needinfo?(vle)
:arr, the drive was replaced and i also reran diags to make sure there was no issue with the hardware when i encountered reimaging issues in c#5.
Flags: needinfo?(vle)
brought back online in bug Bug 1204670.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: