Closed
Bug 1488319
Opened 7 years ago
Closed 7 years ago
[MDC1] t-yosemite-r7-280 is non-responsive
Categories
(Infrastructure & Operations :: RelOps: Hardware, task)
Infrastructure & Operations
RelOps: Hardware
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: zfay, Assigned: dhouse)
References
Details
(Whiteboard: REQ0245527, REQ0245270)
As riman mentioned in the ciduty tracking bug:
This worker is missing from Taskcluster and Papertrail.
I tried to connect through SSH but the root password is required:
root@t-yosemite-r7-280.test.releng.mdc1.mozilla.com's password:...
Ping responds:
Pinging t-yosemite-r7-280.test.releng.mdc1.mozilla.com [10.49.56.221] with 32 bytes of data:
Reply from 10.49.56.221: bytes=32 time=198ms TTL=58
Reply from 10.49.56.221: bytes=32 time=199ms TTL=58
Reply from 10.49.56.221: bytes=32 time=198ms TTL=58
Reply from 10.49.56.221: bytes=32 time=199ms TTL=58
Ping statistics for 10.49.56.221:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 198ms, Maximum = 199ms, Average = 198ms
I've rebooted it from Taskcluster but still missing.
Comment 2•7 years ago
|
||
Probably it did not get puppetized..
Van can you kick-off a manual reinstall for it??
Updated•7 years ago
|
Flags: needinfo?(vle)
Comment 3•7 years ago
|
||
REQ0245270 opened with QTS for reimage.
Assignee: relops → vle
Flags: needinfo?(vle)
Whiteboard: REQ0245270
Comment 4•7 years ago
|
||
QTS reimaged the host. please let me know if issues persist.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment 5•7 years ago
|
||
Problem still persists, machine is not available in task cluster and I think that it still didn't got puppetized, I couldn't log on it. Can you please check it ?
Status: RESOLVED → REOPENED
Flags: needinfo?(vle)
Resolution: FIXED → ---
Comment 6•7 years ago
|
||
>Can you please check it ?
not really, this is a remote data center. i can swing on site in the next couple of weeks. let me ask them to reimage it again.
Flags: needinfo?(vle)
Comment 7•7 years ago
|
||
i've opened REQ0245527 with QTS and asked them to verify the mac mini is reimaging and not just booting back into desktop.
Whiteboard: REQ0245270 → REQ0245527, REQ0245270
Comment 8•7 years ago
|
||
the host is pxebooting but it is not reimaging. it gets the spinning globe but boots back to the desktop. can you confirm the mac is the same as listed in deploystudio?
Flags: needinfo?(apop)
Comment 9•7 years ago
|
||
(In reply to Van Le [:van] from comment #8)
> the host is pxebooting but it is not reimaging. it gets the spinning globe
> but boots back to the desktop. can you confirm the mac is the same as listed
> in deploystudio?
Since the deploy wasn't successful the deploy e-mail never came, and we also cannot ssh into it so it's pretty hard for us to find the exactly MAC address the machine has..
I found the MAC in one of the move document but that's about it..
Updated•7 years ago
|
Flags: needinfo?(apop)
Comment 10•7 years ago
|
||
:bcrisan/:apop what MAC do you have in deploystudio?
Comment 11•7 years ago
|
||
I'm not sure why this failed to reimage but it still had a puppetize.log timestamp from 2017 so it definitely had not reimaged. I reissued the bless command against the bsdpy host just to see if it would work and it did. It is in the process of puppetizing now.
Reporter | ||
Comment 12•7 years ago
|
||
Worker is now taking jobs and completing them.
https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/gecko-t-osx-1010/workers/mdc1/t-yosemite-r7-280
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Comment 13•7 years ago
|
||
Seems like the worker is in a bad state once again. We're hitting the STDIO issue but the worker is pingable.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 14•7 years ago
|
||
Tried to ssh into it and got 'Connection closed by UNKNOWN port 65535'. Could you take a look at this one as well, dhouse?
Flags: needinfo?(dhouse)
Component: RelOps: General → RelOps: Hardware
QA Contact: klibby
Assignee | ||
Comment 16•7 years ago
|
||
logs show machine crashed.
```
Nov 06 13:42:34 t-yosemite-r7-280 com.apple.xpc.launchd: (com.apple.ReportCrash.Root): Service only ran for 0 seconds. Pushing respawn out by 10 seconds.
Nov 06 13:42:55 t-yosemite-r7-280 com.apple.xpc.launchd: (com.apple.configd[72605]): Service exited due to signal: Segmentation fault: 11
Nov 06 13:42:55 t-yosemite-r7-280 com.apple.xpc.launchd: (com.apple.WindowServer[72607]): Service exited due to signal: Segmentation fault: 11
Nov 06 13:42:55 t-yosemite-r7-280 com.apple.xpc.launchd: (com.apple.syncdefaultsd[72606]): Service exited due to signal: Segmentation fault: 11
Nov 06 13:42:55 t-yosemite-r7-280 com.apple.xpc.launchd: (com.apple.WindowServer): Service only ran for 9 seconds. Pushing respawn out by 1 seconds.
Nov 06 13:42:55 t-yosemite-r7-280 com.apple.xpc.launchd: (com.apple.ReportCrash.Root[72608]): Service exited due to signal: Segmentation fault: 11
Nov 06 13:42:55 t-yosemite-r7-280 com.apple.xpc.launchd: (com.apple.ReportCrash.Root): Service only ran for 0 seconds. Pushing respawn out by 10 seconds.
```
```
Nov 06 13:44:56 t-yosemite-r7-280.test.releng.mdc1.mozilla.com WindowServer: Server is starting up
Nov 06 13:45:06 t-yosemite-r7-280.test.releng.mdc1.mozilla.com taskgated: binary have embedded signature that validated /System/Library/PrivateFrameworks/SyncedDefaults.framework/Support/syncdefaultsd[72661]
Nov 06 13:45:07 t-yosemite-r7-280.test.releng.mdc1.mozilla.com taskgated: binary have embedded signature that validated /System/Library/Frameworks/CoreGraphics.framework/Versions/A/Resources/WindowServer[72662]
Nov 06 13:45:07 t-yosemite-r7-280.test.releng.mdc1.mozilla.com WindowServer: Server is starting up
```
Ssh fails because everything keeps getting killed/cycled (connection is logged and goes through but gets cut off).
I'm trying turning off the power for a minute.
Assignee | ||
Comment 17•7 years ago
|
||
The machine took a task after reboot, but I have triggered a reimage to restore it to the original state.
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•