Closed Bug 1437290 Opened 8 years ago Closed 7 years ago

io: read/write on closed pipe

Categories

(Taskcluster :: Workers, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pmoore, Unassigned)

Details

This happened once during test run (https://public-artifacts.taskcluster.net/aePQc6bgQuGpcH2PmA4cuQ/0/public/logs/live_backing.log): === RUN TestMissingArtifactFailsTest 2018/02/08 20:02:32 Detected darwin platform 2018/02/08 20:02:32 No file-caches.json file found, creating empty CacheMap 2018/02/08 20:02:32 No directory-caches.json file found, creating empty CacheMap 2018/02/08 20:02:32 Created dir: /Users/genericworker/generic-worker/tasks/task_1518114986/gopath/src/github.com/taskcluster/generic-worker/testdata/TestMissingArtifactFailsTest/task_1518116552/generic-worker 2018/02/08 20:02:37 Task found 2018/02/08 20:02:37 Setting reclaim timer... 2018/02/08 20:02:37 Current claim will expire at 2018-02-08 19:22:36.698 +0000 UTC 2018/02/08 20:02:37 Reclaiming 3 mins earlier, at 2018-02-08 19:19:36.698 +0000 UTC 2018/02/08 20:02:37 Time to wait until then is 16m59.512919s 2018/02/08 20:02:37 This is more than 30 seconds away - so setting a timer 2018/02/08 20:02:37 JSON payload: { "artifacts": [ { "expires": "2018-02-08T19:32:31.107Z", "path": "Nonexistent/art i fact.txt", "type": "file" } ], "command": [ [ "echo", "hello world!" ], [ "echo", "goodbye world!" ] ], "features": {}, "maxRunTime": 30 } 2018/02/08 20:02:37 Running task https://tools.taskcluster.net/task-inspector/#dZib1e-0TZ2jFGxS-1ho8g/0 2018/02/08 20:02:37 Environment: []string{"SERVER_KEY_FILE=/Users/genericworker/generic-worker/livelog.key", "LIVELOG_PUT_PORT=60022", "SHLVL=3", "LIVELOG_GET_PORT=60023", "GOROOT=/Users/genericworker/generic-worker/tasks/task_1518114986/go1.8.3/go", "GORACE=history_size=7", "TERM=xterm-256color", "GW_CREDS_BOOTSTRAP=/Users/genericworker/generic-worker/gw_private_env.sh", "SERVER_CRT_FILE=/Users/genericworker/generic-worker/livelog.crt", "TASKCLUSTER_CLIENT_ID=project/taskcluster/generic-worker/taskcluster-ci", "TASKCLUSTER_CREDS_BOOTSTRAP=/Users/genericworker/generic-worker/private_env.sh", "OLDPWD=/Users/genericworker/generic-worker/tasks/task_1518114986/gopath/src/github.com/taskcluster/generic-worker/gw-codegen", "TASK_ID=dZib1e-0TZ2jFGxS-1ho8g", "USER=genericworker", "HOME=/Users/genericworker", "GOPATH=/Users/genericworker/generic-worker/tasks/task_1518114986/gopath", "_=/Users/genericworker/generic-worker/tasks/task_1518114986/go1.8.3/go/bin/go", "SHELL=/bin/bash", "PATH=/Users/genericworker/generic-worker/tasks/task_1518114986/gopath/bin:/Users/genericworker/generic-worker/tasks/task_1518114986/go1.8.3/go/bin:/Users/genericworker/go/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/go/bin:/usr/local/MacGPG2/bin:/opt/X11/bin", "PWD=/Users/genericworker/generic-worker/tasks/task_1518114986/gopath/src/github.com/taskcluster/generic-worker", "ACCESS_TOKEN=FQAlzKxBQp6TEBPQ9pysYA"} 2018/02/08 20:02:37 Environment: []string{"PWD=/Users/genericworker/generic-worker/tasks/task_1518114986/gopath/src/github.com/taskcluster/generic-worker", "ACCESS_TOKEN=FQAlzKxBQp6TEBPQ9pysYA", "HOME=/Users/genericworker", "OLDPWD=/Users/genericworker/generic-worker/tasks/task_1518114986/gopath/src/github.com/taskcluster/generic-worker/gw-codegen", "TASK_ID=dZib1e-0TZ2jFGxS-1ho8g", "PATH=/Users/genericworker/generic-worker/tasks/task_1518114986/gopath/bin:/Users/genericworker/generic-worker/tasks/task_1518114986/go1.8.3/go/bin:/Users/genericworker/go/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/go/bin:/usr/local/MacGPG2/bin:/opt/X11/bin", "SERVER_CRT_FILE=/Users/genericworker/generic-worker/livelog.crt", "SHELL=/bin/bash", "LIVELOG_PUT_PORT=60022", "SHLVL=3", "GOROOT=/Users/genericworker/generic-worker/tasks/task_1518114986/go1.8.3/go", "TASKCLUSTER_CLIENT_ID=project/taskcluster/generic-worker/taskcluster-ci", "_=/Users/genericworker/generic-worker/tasks/task_1518114986/go1.8.3/go/bin/go", "SERVER_KEY_FILE=/Users/genericworker/generic-worker/livelog.key", "GW_CREDS_BOOTSTRAP=/Users/genericworker/generic-worker/gw_private_env.sh", "USER=genericworker", "LIVELOG_GET_PORT=60023", "GORACE=history_size=7", "TASKCLUSTER_CREDS_BOOTSTRAP=/Users/genericworker/generic-worker/private_env.sh", "TERM=xterm-256color", "GOPATH=/Users/genericworker/generic-worker/tasks/task_1518114986/gopath"} 2018/02/08 20:02:38 Disk available: 110918766592 bytes 2018/02/08 20:02:38 Executing command 0: ["echo" "hello world!"] 2018/02/08 20:02:39 Redirecting public/logs/live.log to public/logs/live_backing.log 2018/02/08 20:02:40 Request 2018/02/08 20:02:40 PUT /dZib1e-0TZ2jFGxS-1ho8g/0/public/logs/live_backing.log?AWSAccessKeyId=AKIAJQESBGXODWDRTZUA&Content-Type=text%2Fplain%3B%20charset%3Dutf-8&Expires=1518118369&Signature=NYTT3LcD61wsahpcFFXFRcQRVBM%3D HTTP/1.1 Host: taskcluster-public-artifacts.s3-us-west-2.amazonaws.com User-Agent: Go-http-client/1.1 Content-Length: 1192 Content-Encoding: gzip Content-Type: text/plain; charset=utf-8 Accept-Encoding: gzip 2018/02/08 20:02:40 1 put requests issued to https://taskcluster-public-artifacts.s3-us-west-2.amazonaws.com/dZib1e-0TZ2jFGxS-1ho8g/0/public/logs/live_backing.log?AWSAccessKeyId=AKIAJQESBGXODWDRTZUA&Content-Type=text%2Fplain%3B%20charset%3Dutf-8&Expires=1518118369&Signature=NYTT3LcD61wsahpcFFXFRcQRVBM%3D 2018/02/08 20:02:40 Response 2018/02/08 20:02:40 HTTP/1.1 200 OK Content-Length: 0 Date: Thu, 08 Feb 2018 19:02:41 GMT Etag: "83a532335092a1f114bb46420622b1ab" Server: AmazonS3 X-Amz-Id-2: 6ZO/iquJxNp6aMO5ApMR9JgVh6drAiwvZDuziNp7olAHHf9uhp5du0wZR0tEFMAxmyMZ0IaJ9Pg= X-Amz-Request-Id: 6D3DA2A3150F96F5 X-Amz-Version-Id: 11uHe2TRgsBbbJ.LFMsOJXhHtoxj6pLW 2018/02/08 20:02:40 Resolving task... 2018/02/08 20:02:40 Waiting for task status lock for task dZib1e-0TZ2jFGxS-1ho8g... 2018/02/08 20:02:40 Lock for task status for task dZib1e-0TZ2jFGxS-1ho8g acquired. 2018/02/08 20:02:41 Releasing task status lock for task dZib1e-0TZ2jFGxS-1ho8g... 2018/02/08 20:02:41 Lock for task status for task dZib1e-0TZ2jFGxS-1ho8g released. 2018/02/08 20:02:41 Saving file file-caches.json with content: {} 2018/02/08 20:02:41 Saving file directory-caches.json with content: {} 2018/02/08 20:02:41 goroutine 9 [running]: runtime/debug.Stack(0x1774f95, 0xc42000e130, 0x0) /Users/genericworker/generic-worker/tasks/task_1518114986/go1.8.3/go/src/runtime/debug/stack.go:24 +0x87 github.com/taskcluster/generic-worker.HandleCrash(0x1801dc0, 0xc420070070) /Users/genericworker/generic-worker/tasks/task_1518114986/gopath/src/github.com/taskcluster/generic-worker/main.go:518 +0x45 github.com/taskcluster/generic-worker.RunWorker.func1(0xc420645ca0) /Users/genericworker/generic-worker/tasks/task_1518114986/gopath/src/github.com/taskcluster/generic-worker/main.go:527 +0x65 panic(0x1801dc0, 0xc420070070) /Users/genericworker/generic-worker/tasks/task_1518114986/go1.8.3/go/src/runtime/panic.go:489 +0x2f0 github.com/taskcluster/generic-worker.(*TaskRun).Run.func1(0xc42000e080, 0xc420084300) /Users/genericworker/generic-worker/tasks/task_1518114986/gopath/src/github.com/taskcluster/generic-worker/main.go:1050 +0x1bf panic(0x1801dc0, 0xc420070070) /Users/genericworker/generic-worker/tasks/task_1518114986/go1.8.3/go/src/runtime/panic.go:489 +0x2f0 github.com/taskcluster/generic-worker.(*TaskRun).Run.func3(0xc42000e080, 0xc420084300, 0xc42000e088) /Users/genericworker/generic-worker/tasks/task_1518114986/gopath/src/github.com/taskcluster/generic-worker/main.go:1083 +0x24a panic(0x1801dc0, 0xc420070070) /Users/genericworker/generic-worker/tasks/task_1518114986/go1.8.3/go/src/runtime/panic.go:489 +0x2f0 github.com/taskcluster/generic-worker.(*TaskRun).ExecuteCommand(0xc420084300, 0x0, 0xc420084300) /Users/genericworker/generic-worker/tasks/task_1518114986/gopath/src/github.com/taskcluster/generic-worker/main.go:941 +0x6bf github.com/taskcluster/generic-worker.(*TaskRun).Run(0xc420084300, 0x0) /Users/genericworker/generic-worker/tasks/task_1518114986/gopath/src/github.com/taskcluster/generic-worker/main.go:1197 +0x2449 github.com/taskcluster/generic-worker.FindAndRunTask(0x12a05f200) /Users/genericworker/generic-worker/tasks/task_1518114986/gopath/src/github.com/taskcluster/generic-worker/main.go:717 +0x9dc github.com/taskcluster/generic-worker.RunWorker(0x0) /Users/genericworker/generic-worker/tasks/task_1518114986/gopath/src/github.com/taskcluster/generic-worker/main.go:602 +0x979 github.com/taskcluster/generic-worker.execute(0xc42032d040) /Users/genericworker/generic-worker/tasks/task_1518114986/gopath/src/github.com/taskcluster/generic-worker/helper_test.go:189 +0xf2 github.com/taskcluster/generic-worker.scheduleAndExecute(0xc42032d040, 0xc420386c00, 0xc42006ea20, 0x1, 0x1, 0xc4202309c0, 0x2, 0x2, 0x0, 0x0, ...) /Users/genericworker/generic-worker/tasks/task_1518114986/gopath/src/github.com/taskcluster/generic-worker/helper_test.go:198 +0xbc github.com/taskcluster/generic-worker.TestMissingArtifactFailsTest(0xc42032d040) /Users/genericworker/generic-worker/tasks/task_1518114986/gopath/src/github.com/taskcluster/generic-worker/artifacts_test.go:424 +0x413 testing.tRunner(0xc42032d040, 0x191c4e0) /Users/genericworker/generic-worker/tasks/task_1518114986/go1.8.3/go/src/testing/testing.go:657 +0x108 created by testing.(*T).Run /Users/genericworker/generic-worker/tasks/task_1518114986/go1.8.3/go/src/testing/testing.go:697 +0x544 2018/02/08 20:02:41 *********** PANIC occurred! *********** 2018/02/08 20:02:41 io: read/write on closed pipe 2018/02/08 20:02:41 No sentry project defined, not reporting to sentry --- FAIL: TestMissingArtifactFailsTest (10.19s) helper_test.go:192: Something went wrong executing worker - got exit code 69 but was expecting exit code 0 Maybe a race condition? It doesn't happen consistently.
It looks like this is coming from the runlib library: https://github.com/taskcluster/runlib/blob/7fa257e0aaed6b00edeea02f62c38e67ff4b4924/subprocess/subprocess.go#L130-L137 Either sub.CreateFrozen() or d.SetupRedirectionBuffers() is returning the error "io: read/write on closed pipe".
It may be we no longer need the full runlib library in generic-worker 10 since we considerably simplified things by no longer needing to created desktops etc. However, if we do remove it, we should make sure we don't introduce regressions from e.g. bug 1303455 and bug 1330718.
(In reply to Pete Moore [:pmoore][:pete] from comment #1) > It looks like this is coming from the runlib library: > > https://github.com/taskcluster/runlib/blob/ > 7fa257e0aaed6b00edeea02f62c38e67ff4b4924/subprocess/subprocess.go#L130-L137 > > Either sub.CreateFrozen() or d.SetupRedirectionBuffers() is returning the > error "io: read/write on closed pipe". I suspect the issue is coming from one of the `startAfterStart` functions: https://github.com/taskcluster/runlib/blob/7fa257e0aaed6b00edeea02f62c38e67ff4b4924/subprocess/redirects.go#L28-L32 https://github.com/taskcluster/runlib/blob/7fa257e0aaed6b00edeea02f62c38e67ff4b4924/subprocess/redirects.go#L73-L79 I think these get executed after the process starts, and perhaps an io.Copy fails, because the process has already completed.
QA Contact: pmoore
I haven't seen this since 24 April, which was on generic-worker 10.7.9: https://sentry.prod.mozaws.net/operations/generic-worker-tests/issues/3353048/ I'm hoping this is now solved, but will keep it open for a while, to be sure.
5 months with no further issues -- sounds like it's fixed then!
Flags: needinfo?(pmoore)
Yeah, let's call this done! :-)
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(pmoore)
Resolution: --- → FIXED
Component: Generic-Worker → Workers
You need to log in before you can comment on or make changes to this bug.