Closed
Bug 1472861
(t-yosemite-r7-327)
Opened 7 years ago
Closed 6 years ago
[MDC1] t-yosemite-r7-327 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, defect)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: relops-bug-generator, Unassigned)
References
Details
No description provided.
Updated•7 years ago
|
Summary: t-yosemite-r7-327.test.releng.mdc1.mozilla.com. problem tracking → [MDC1] t-yosemite-r7-327 problem tracking
Comment 1•7 years ago
|
||
Machine was reimaged by :zfay and I took over the check stage.
Machine is not taking jobs, so I went ahead with a second re-image.
Comment 2•7 years ago
|
||
Machine went back into Stdio forwarding request failed issue.
Comment 3•7 years ago
|
||
the machine it appears on Task cluster but it doesn't takes new jobs.
I have tried to log on it but I have received the following error :
Stdio forwarding request failed: Session open refused by peer
ssh_exchange_identification: Connection closed by remote host
This machine is being investigated for hardware issues through the DCOps depend bug 1472862
The last time it was reimaged by QTS, the machine did not come back up (no deploystudio mail also).
no logs
no ping
no ssh
https://papertrailapp.com/groups/1223184?filter=t-yosemite-r7-327
```
[dhouse@rejh2.srv.releng.mdc1.mozilla.com ~]$ ping t-yosemite-r7-327.test.releng.mdc1.mozilla.com
PING t-yosemite-r7-327.test.releng.mdc1.mozilla.com (10.49.56.111) 56(84) bytes of data.
^C
--- t-yosemite-r7-327.test.releng.mdc1.mozilla.com ping statistics ---
80 packets transmitted, 0 received, 100% packet loss, time 79928ms
[dhouse@rejh2.srv.releng.mdc1.mozilla.com ~]$ host t-yosemite-r7-327.test.releng.mdc1.mozilla.com
t-yosemite-r7-327.test.releng.mdc1.mozilla.com has address 10.49.56.111
[dhouse@rejh2.srv.releng.mdc1.mozilla.com ~]$ ssh root@t-yosemite-r7-327.test.releng.mdc1.mozilla.com
ssh: connect to host t-yosemite-r7-327.test.releng.mdc1.mozilla.com port 22: Connection timed out
```
Reimaged and it is up and working correctly.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment 6•6 years ago
|
||
Did not took tasks for 4 days, ssh unresponsive, rebooted from taskcluster, ssh came back alive, started the reimage process, the successful termination message came shortly after.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 7•6 years ago
|
||
Looks good for now : https://tools.taskcluster.net/groups/LFD9J2KYSJe9osxx3Zi_3Q/tasks/RQU6etcmS52K-kspj9QnPA/runs/0
Status: REOPENED → RESOLVED
Closed: 7 years ago → 6 years ago
Resolution: --- → FIXED
Comment 8•6 years ago
|
||
Seems like we're hitting the stdio issue when trying to ssh into the machine. Looks alive as it responds to ping.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
rebooted and it looks good
https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/gecko-t-osx-1010/workers/mdc1/t-yosemite-r7-327
machine was repeatedly crashing
https://papertrailapp.com/systems/t-yosemite-r7-327/events
```
Nov 07 21:29:59 t-yosemite-r7-327 com.apple.xpc.launchd: (com.apple.ReportCrash.Root[20004]): Endpoint has been activated through legacy launch(3) APIs. Please switch to XPC or bootstrap_check_in(): com.apple.ReportCrash.DirectoryService
Nov 07 21:29:59 t-yosemite-r7-327 com.apple.xpc.launchd: (com.apple.opendirectoryd[20001]): Service exited due to signal: Segmentation fault: 11
Nov 07 21:29:59 t-yosemite-r7-327 com.apple.xpc.launchd: (com.apple.configd[20002]): Service exited due to signal: Segmentation fault: 11
Nov 07 21:29:59 t-yosemite-r7-327 com.apple.xpc.launchd: (com.apple.Kerberos.digest-service[20003]): Service exited due to signal: Segmentation fault: 11
Nov 07 21:29:59 t-yosemite-r7-327 com.apple.xpc.launchd: (com.apple.ReportCrash.Root[20004]): Service exited due to signal: Segmentation fault: 11
```
Status: REOPENED → RESOLVED
Closed: 6 years ago → 6 years ago
Resolution: --- → FIXED
Comment 10•6 years ago
|
||
High exception/busted rate on the worker and one of the jobs terminated with:
16:59:17 INFO - GECKO(1170) | [Child 1176, Main Thread] WARNING: No active window: file /builds/worker/workspace/build/src/js/xpconnect/src/XPCJSContext.cpp, line 662
16:59:17 INFO - GECKO(1170) | [Child 1176, Main Thread] WARNING: No active window: file /builds/worker/workspace/build/src/js/xpconnect/src/XPCJSContext.cpp, line 662
16:59:17 INFO - GECKO(1170) | ++DOMWINDOW == 2 (0x10b2ed800) [pid = 1176] [serial = 4] [outer = 0x10b244020]
16:59:18 INFO - checking window state
16:59:18 INFO - TEST-START | toolkit/components/thumbnails/test/browser_thumbnails_bg_no_alert.js
16:59:18 INFO - GECKO(1170) | ++DOMWINDOW == 10 (0x11e6f6000) [pid = 1172] [serial = 16] [outer = 0x127f43020]
16:59:18 INFO - GECKO(1170) | --DOMWINDOW == 0 (0x1232b5c00) [pid = 1181] [serial = 2] [outer = 0x0] [url = about:blank]
[taskcluster 2019-06-03T16:59:20.219Z] Exit Code: -1
[taskcluster 2019-06-03T16:59:20.219Z] User Time: 1m37.351634s
[taskcluster 2019-06-03T16:59:20.219Z] Kernel Time: 29.672282s
[taskcluster 2019-06-03T16:59:20.219Z] Wall Time: 13m26.359194598s
[taskcluster 2019-06-03T16:59:20.219Z] Result: FAILED
[taskcluster 2019-06-03T16:59:20.219Z] === Task Finished ===
[taskcluster 2019-06-03T16:59:20.219Z] Task Duration: 13m26.359383164s
[taskcluster 2019-06-03T16:59:20.939Z] Uploading artifact public/logs/localconfig.json from file logs/localconfig.json with content encoding "gzip", mime type "application/json" and expiry 2020-06-02T16:05:36.361Z
[taskcluster 2019-06-03T16:59:21.579Z] Uploading artifact public/test_info/manifests.list from file build/blobber_upload_dir/manifests.list with content encoding "gzip", mime type "text/plain; charset=utf-8" and expiry 2020-06-02T16:05:36.361Z
[taskcluster 2019-06-03T16:59:22.103Z] Uploading artifact public/test_info/mochitest-browser-chrome-chunked_errorsummary.log from file build/blobber_upload_dir/mochitest-browser-chrome-chunked_errorsummary.log with content encoding "gzip", mime type "text/plain" and expiry 2020-06-02T16:05:36.361Z
[taskcluster 2019-06-03T16:59:22.471Z] Uploading artifact public/test_info/mochitest-browser-chrome-chunked_raw.log from file build/blobber_upload_dir/mochitest-browser-chrome-chunked_raw.log with content encoding "gzip", mime type "text/plain" and expiry 2020-06-02T16:05:36.361Z
[taskcluster 2019-06-03T16:59:23.061Z] Uploading artifact public/test_info/system-info.log from file build/blobber_upload_dir/system-info.log with content encoding "gzip", mime type "text/plain" and expiry 2020-06-02T16:05:36.361Z
[taskcluster:error] signal: illegal instruction
The worker has been quarantined and it's under investigation.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•6 years ago
|
Type: task → defect
Comment 11•6 years ago
|
||
the machine seems to be up and running and taking jobs.
https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/gecko-t-osx-1010/workers/mdc1/t-yosemite-r7-327
We will close the bug for now. If the problem will persist in the future, we will re-open this bug.
Status: REOPENED → RESOLVED
Closed: 6 years ago → 6 years ago
Resolution: --- → FIXED
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•