Closed
Bug 1479065
(t-yosemite-r7-350)
Opened 7 years ago
Closed 7 years ago
[MDC1] t-yosemite-r7-350 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: riman, Unassigned)
References
Details
I have tried to connect via SSH to this machine, but it returned this:
Stdio forwarding request failed: Session open refused by peer
ssh_exchange_identification: Connection closed by remote host
Last job was an exception (the exit code 69 reported in bug 1478525), but that did not cause generic-worker to stop running. It was running normally, and then shut down for "mainentance". Something caused it to power down.
https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/gecko-t-osx-1010/workers/mdc1/t-yosemite-r7-350
```
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: 2018/07/25 17:36:02 Querying queue to get latest status for task agIZgairQDa_6RGnTvDh_w...
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: 2018/07/25 17:36:02 Latest status: Errored
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: 2018/07/25 17:36:02 Resolving task agIZgairQDa_6RGnTvDh_w ...
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: 2018/07/25 17:36:02 Not updating status of task agIZgairQDa_6RGnTvDh_w run 0 from Errored to Failed. This is because you can only update to status Failed if the previous status was one of: [Claimed Reclaimed Aborted]
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: 2018/07/25 17:36:02 Saving file file-caches.json (absolute path: /Users/cltbld/file-caches.json)
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: 2018/07/25 17:36:02 Saving file directory-caches.json (absolute path: /Users/cltbld/directory-caches.json)
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: 2018/07/25 17:36:02 goroutine 1 [running]:
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: runtime/debug.Stack(0x421b16d00, 0x142081c, 0x155bae3)
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: #011/home/travis/.gimme/versions/go1.10.2.src/src/runtime/debug/stack.go:24 +0xa7
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: main.HandleCrash(0x14aff80, 0x421f52080)
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: #011/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:570 +0x26
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: main.RunWorker.func1(0x421b17df0)
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: #011/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:589 +0x52
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: panic(0x14aff80, 0x421f52080)
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: #011/home/travis/.gimme/versions/go1.10.2.src/src/runtime/panic.go:502 +0x229
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: main.(*TaskRun).Run.func1(0x421dde028, 0x4219e4a00)
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: #011/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:1086 +0xc5
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: panic(0x14aff80, 0x421f52080)
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: #011/home/travis/.gimme/versions/go1.10.2.src/src/runtime/panic.go:502 +0x229
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: panic(0x14aff80, 0x421fa4050)
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: #011/home/travis/.gimme/versions/go1.10.2.src/src/runtime/panic.go:502 +0x229
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: main.(*TaskRun).uploadArtifact(0x4219e4a00, 0x1604580, 0x421e940f0, 0x0)
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: #011/home/travis/gopath/src/github.com/taskcluster/generic-worker/artifacts.go:467 +0x1057
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: main.(*TaskRun).uploadLog(0x4219e4a00, 0x155e471, 0x1c, 0x421d24fc0, 0x1f, 0x1)
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: #011/home/travis/gopath/src/github.com/taskcluster/generic-worker/artifacts.go:411 +0x12f
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: main.(*TaskRun).Run.func2(0x421dde028, 0x4219e4a00, 0x421dde030)
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: #011/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:1101 +0xe6
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: panic(0x14aff80, 0x421f52080)
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: #011/home/travis/.gimme/versions/go1.10.2.src/src/runtime/panic.go:502 +0x229
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: main.(*TaskRun).uploadArtifact(0x4219e4a00, 0x1604580, 0x421f85bc0, 0x1c)
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: #011/home/travis/gopath/src/github.com/taskcluster/generic-worker/artifacts.go:467 +0x1057
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: main.(*TaskRun).Run.func4(0x4219e4a00, 0x421dde028)
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: #011/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:1194 +0x23c
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: main.(*TaskRun).Run(0x4219e4a00, 0x421af80a0)
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: #011/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:1255 +0x17cb
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: main.RunWorker(0x0)
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: #011/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:671 +0xd4f
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: main.main()
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: #011/home/travis/gopath/src/github.com/taskcluster/generic-worker/main.go:400 +0x608
Jul 25 17:36:02 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: 2018/07/25 17:36:02 *********** PANIC occurred! ***********
Jul 25 17:36:03 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: 2018/07/25 17:36:02 WORKER EXCEPTION due to response code 401 from Queue when uploading artifact &main.S3Artifact{BaseArtifact:(*main.BaseArtifact)(0x421ddc400), Path:"logs/localconfig.json", ContentEncoding:""} with CreateArtifact payload {"contentType":"application/json","expires":"2019-07-25T19:59:05.115Z","storageType":"s3"}
Jul 25 17:36:04 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: 2018/07/25 17:36:04 Exiting worker with exit code 69
...
Jul 25 17:36:05 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: 2018/07/25 17:36:05 Removing task directory '/Users/cltbld/tasks/task_1532549048'...
...
Jul 25 17:36:10 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: 2018/07/25 17:36:10 All features initialised.
Jul 25 17:36:10 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: 2018/07/25 17:36:10 Created dir: /Users/cltbld/tasks/task_1532565370/generic-worker
...
Jul 25 20:59:48 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: 2018/07/25 20:59:48 Disk available: 234181001216 bytes
Jul 25 20:59:52 t-yosemite-r7-350.test.releng.mdc1.mozilla.com powerd: PID 50(powerd) TimedOut InternalPreventSleep "com.apple.powermanagement.acwakelinger" 00:00:45 id:0xd0000013d [System: SRPrevSleep kCPU]
Jul 25 20:59:52 t-yosemite-r7-350.test.releng.mdc1.mozilla.com powerd: Summary- [System: No Assertions] Using AC
Jul 25 20:59:52 t-yosemite-r7-350.test.releng.mdc1.mozilla.com configd: store_notifier: changedKeys <array> {
Jul 25 20:59:52 t-yosemite-r7-350.test.releng.mdc1.mozilla.com configd: 0 : State:/IOKit/SystemPowerCapabilities
Jul 25 20:59:52 t-yosemite-r7-350.test.releng.mdc1.mozilla.com configd: }
Jul 25 20:59:52 t-yosemite-r7-350.test.releng.mdc1.mozilla.com configd: store_notifier: powerkey 0
Jul 25 20:59:52 t-yosemite-r7-350.test.releng.mdc1.mozilla.com powerd: Entering Sleep state due to 'Maintenance Sleep': Using AC TCPKeepAlive=inactive
Jul 25 20:59:52 t-yosemite-r7-350.test.releng.mdc1.mozilla.com configd: SCNC Controller: pm_ConnectionHandler capabilities = 0x0, sleeping = 0 and DarkWake = 1.
Jul 25 20:59:52 t-yosemite-r7-350.test.releng.mdc1.mozilla.com configd: SCNC Controller: pm_ConnectionHandler going to sleep, delay = 0.
Jul 25 20:59:52 t-yosemite-r7-350.test.releng.mdc1.mozilla.com airportd: _configureScanOffloadParameters: Unable to configure scan offloading on en1 (Device power is off)
```
papertrail logs "last seen 1 day ago" https://papertrailapp.com/groups/1223184?filter=t-yosemite-r7-350
no ping
no ssh (times out)
```
[dhouse@rejh2.srv.releng.mdc1.mozilla.com ~]$ ssh root@t-yosemite-r7-350.test.releng.mdc1.mozilla.com
ssh: connect to host t-yosemite-r7-350.test.releng.mdc1.mozilla.com port 22: Connection timed out
[dhouse@rejh2.srv.releng.mdc1.mozilla.com ~]$ ping t-yosemite-r7-350.test.releng.mdc1.mozilla.com
PING t-yosemite-r7-350.test.releng.mdc1.mozilla.com (10.49.56.134) 56(84) bytes of data.
^C
--- t-yosemite-r7-350.test.releng.mdc1.mozilla.com ping statistics ---
233 packets transmitted, 0 received, 100% packet loss, time 232765ms
```
pdu shows power on, but I think the mini may be shut down while the outlet is still on
I rebooted through roller. Minimal logs appeared in papertrail, still not response to ping or ssh.
```
Jul 27 12:18:19 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: 2018/07/27 12:18:22 Error: Post https://queue.taskcluster.net/v1/claim-work/releng-hardware/gecko-t-osx-1010: read tcp 10.49.56.134:52016->184.72.216.59:443: read: operation timed out
Jul 27 12:18:20 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: 2018/07/27 12:18:24 No task claimed. Idle for 42h42m13.995790122s (will exit if no task claimed in 53h17m46.004209878s). 1 more tasks to run before exiting.
Jul 27 12:18:25 t-yosemite-r7-350.test.releng.mdc1.mozilla.com generic-worker: 2018/07/27 12:18:29 Disk available: 234206502912 bytes
```
```
[dhouse@rejh2.srv.releng.mdc1.mozilla.com ~]$ ssh root@t-yosemite-r7-350.test.releng.mdc1.mozilla.com
^C
[dhouse@rejh2.srv.releng.mdc1.mozilla.com ~]$ ping t-yosemite-r7-350.test.releng.mdc1.mozilla.com
PING t-yosemite-r7-350.test.releng.mdc1.mozilla.com (10.49.56.134) 56(84) bytes of data.
^C
--- t-yosemite-r7-350.test.releng.mdc1.mozilla.com ping statistics ---
63 packets transmitted, 0 received, 100% packet loss, time 62194ms
```
I manually powered off the machine (pdu power off) and waited a few seconds, and then powered it back on. It briefly responds to ping, and ssh (prompts for a password), but then stops responding to ping/ssh. Same logs appear in papertrail and then stop.
After another reboot, ping/ssh stayed and the machine looks normal. I've removed the quarantine to see if it has any trouble running tasks.
Looks like no problems. Has completed 2 tasks and started another.
https://tools.taskcluster.net/provisioners/releng-hardware/worker-types/gecko-t-osx-1010/workers/mdc1/t-yosemite-r7-350
https://papertrailapp.com/systems/t-yosemite-r7-350.test.releng.mdc1.mozilla.com/events
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•