Closed Bug 1488319 Opened 7 years ago Closed 7 years ago

[MDC1] t-yosemite-r7-280 is non-responsive

Categories

(Infrastructure & Operations :: RelOps: Hardware, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: zfay, Assigned: dhouse)

References

Details

(Whiteboard: REQ0245527, REQ0245270)

As riman mentioned in the ciduty tracking bug: This worker is missing from Taskcluster and Papertrail. I tried to connect through SSH but the root password is required: root@t-yosemite-r7-280.test.releng.mdc1.mozilla.com's password:... Ping responds: Pinging t-yosemite-r7-280.test.releng.mdc1.mozilla.com [10.49.56.221] with 32 bytes of data: Reply from 10.49.56.221: bytes=32 time=198ms TTL=58 Reply from 10.49.56.221: bytes=32 time=199ms TTL=58 Reply from 10.49.56.221: bytes=32 time=198ms TTL=58 Reply from 10.49.56.221: bytes=32 time=199ms TTL=58 Ping statistics for 10.49.56.221: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 198ms, Maximum = 199ms, Average = 198ms I've rebooted it from Taskcluster but still missing.
Probably it did not get puppetized.. Van can you kick-off a manual reinstall for it??
Flags: needinfo?(vle)
REQ0245270 opened with QTS for reimage.
Assignee: relops → vle
Flags: needinfo?(vle)
Whiteboard: REQ0245270
QTS reimaged the host. please let me know if issues persist.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Problem still persists, machine is not available in task cluster and I think that it still didn't got puppetized, I couldn't log on it. Can you please check it ?
Status: RESOLVED → REOPENED
Flags: needinfo?(vle)
Resolution: FIXED → ---
>Can you please check it ? not really, this is a remote data center. i can swing on site in the next couple of weeks. let me ask them to reimage it again.
Flags: needinfo?(vle)
i've opened REQ0245527 with QTS and asked them to verify the mac mini is reimaging and not just booting back into desktop.
Whiteboard: REQ0245270 → REQ0245527, REQ0245270
the host is pxebooting but it is not reimaging. it gets the spinning globe but boots back to the desktop. can you confirm the mac is the same as listed in deploystudio?
Flags: needinfo?(apop)
(In reply to Van Le [:van] from comment #8) > the host is pxebooting but it is not reimaging. it gets the spinning globe > but boots back to the desktop. can you confirm the mac is the same as listed > in deploystudio? Since the deploy wasn't successful the deploy e-mail never came, and we also cannot ssh into it so it's pretty hard for us to find the exactly MAC address the machine has.. I found the MAC in one of the move document but that's about it..
Flags: needinfo?(apop)
:bcrisan/:apop what MAC do you have in deploystudio?
I'm not sure why this failed to reimage but it still had a puppetize.log timestamp from 2017 so it definitely had not reimaged. I reissued the bless command against the bsdpy host just to see if it would work and it did. It is in the process of puppetizing now.
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Seems like the worker is in a bad state once again. We're hitting the STDIO issue but the worker is pingable.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Tried to ssh into it and got 'Connection closed by UNKNOWN port 65535'. Could you take a look at this one as well, dhouse?
Flags: needinfo?(dhouse)
Assignee: vle → dhouse
Flags: needinfo?(dhouse)
Component: RelOps: General → RelOps: Hardware
QA Contact: klibby
logs show machine crashed. ``` Nov 06 13:42:34 t-yosemite-r7-280 com.apple.xpc.launchd: (com.apple.ReportCrash.Root): Service only ran for 0 seconds. Pushing respawn out by 10 seconds. Nov 06 13:42:55 t-yosemite-r7-280 com.apple.xpc.launchd: (com.apple.configd[72605]): Service exited due to signal: Segmentation fault: 11 Nov 06 13:42:55 t-yosemite-r7-280 com.apple.xpc.launchd: (com.apple.WindowServer[72607]): Service exited due to signal: Segmentation fault: 11 Nov 06 13:42:55 t-yosemite-r7-280 com.apple.xpc.launchd: (com.apple.syncdefaultsd[72606]): Service exited due to signal: Segmentation fault: 11 Nov 06 13:42:55 t-yosemite-r7-280 com.apple.xpc.launchd: (com.apple.WindowServer): Service only ran for 9 seconds. Pushing respawn out by 1 seconds. Nov 06 13:42:55 t-yosemite-r7-280 com.apple.xpc.launchd: (com.apple.ReportCrash.Root[72608]): Service exited due to signal: Segmentation fault: 11 Nov 06 13:42:55 t-yosemite-r7-280 com.apple.xpc.launchd: (com.apple.ReportCrash.Root): Service only ran for 0 seconds. Pushing respawn out by 10 seconds. ``` ``` Nov 06 13:44:56 t-yosemite-r7-280.test.releng.mdc1.mozilla.com WindowServer: Server is starting up Nov 06 13:45:06 t-yosemite-r7-280.test.releng.mdc1.mozilla.com taskgated: binary have embedded signature that validated /System/Library/PrivateFrameworks/SyncedDefaults.framework/Support/syncdefaultsd[72661] Nov 06 13:45:07 t-yosemite-r7-280.test.releng.mdc1.mozilla.com taskgated: binary have embedded signature that validated /System/Library/Frameworks/CoreGraphics.framework/Versions/A/Resources/WindowServer[72662] Nov 06 13:45:07 t-yosemite-r7-280.test.releng.mdc1.mozilla.com WindowServer: Server is starting up ``` Ssh fails because everything keeps getting killed/cycled (connection is logged and goes through but gets cut off). I'm trying turning off the power for a minute.
The machine took a task after reboot, but I have triggered a reimage to restore it to the original state.
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.