Closed Bug 990681 Opened 10 years ago Closed 10 years ago

docker-worker, pulling image error: Connection reset by peer

Categories

(Taskcluster :: Workers, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jonasfj, Unassigned)

Details

For some reason the "connection is reset by peer" on some pulls. I've seen this error more than once. I suspect it has to do with pulling big images.

Checkout run 1:
http://docs.taskcluster.net/tools/task-inspector/#TBNESMHsTVCVkk-bGDEY3g

We could try to pull multiple times... Or upgrade docker version (to see if that works) or submit a patch to docker that retries HTTP GETs.
Perhaps we should look into support byte-range requests when we redirect to S3 in registry.taskcluster.net.
So I tracked this down to back-pressure from azureLiveLog.
Basically, node streams does not merge chunks that build up on the buffer... So when each line is written as it's own chunk azureLiveLog will upload and commit one line at the time... This causes back-pressure which causes docker pull to fail..

I suspect it also caused the docker container to block waiting for stdout to be unblocked.

This was fixed in https://github.com/taskcluster/docker-worker/pull/29
We just, buffer up all output and send it to azure every 5s. This way azure livelog will always be commiting at least 5 seconds worth of log.

I think this is good enough for now. An alternative scheme would be to buffer/merge all chunks and send it as fast as possible... As we pay per request with azure, the 5s scheme is probably best.

Note, if we implement live logging with google cloud storage, using chunked transfer we'll still need some form of buffering. In this case it might be better to send as soon as previous chunk was sent.
But I seriously doubt that it's a good idea to send each line as it's own chunk is a good idea...
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Component: TaskCluster → Docker-Worker
Product: Testing → Taskcluster
Component: Docker-Worker → Workers
You need to log in before you can comment on or make changes to this bug.