Closed Bug 1285090 Opened 8 years ago Closed 7 years ago

Intermittent [taskcluster:error] Task was aborted because states could not be created successfully. Error calling 'link' for relengAPIProxy : Failed to initialize releng API proxy service due to: Error: timed out while opening connection

Categories

(Taskcluster :: Workers, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: intermittent-bug-filer, Assigned: wcosta)

References

Details

(Keywords: intermittent-failure, Whiteboard: [docker-link-failure][fennec-scouting][stockwell infra])

[taskcluster 2016-07-07 02:59:10.024Z] Task ID: WH6lxISlSN2SBpVZvcwyVw
[taskcluster 2016-07-07 02:59:10.024Z] Worker ID: i-0e4a7c5a8b5039138
[taskcluster 2016-07-07 02:59:10.024Z] Worker Group: us-west-2a
[taskcluster 2016-07-07 02:59:10.024Z] Worker Type: opt-linux64
[taskcluster 2016-07-07 02:59:10.024Z] Public IP: 54.149.193.49
[taskcluster 2016-07-07 02:59:10.025Z] using cache "level-3-mozilla-inbound-tc-vcs" -> /home/worker/.tc-vcs
[taskcluster 2016-07-07 02:59:10.025Z] using cache "level-3-mozilla-inbound-build-linux64-artifact-workspace" -> /home/worker/workspace
[taskcluster 2016-07-07 02:59:10.025Z] using cache "tooltool-cache" -> /home/worker/tooltool-cache


[taskcluster:error] Task was aborted because states could not be created successfully. Error calling 'link' for relengAPIProxy : Failed to initialize releng API proxy service due to: Error: timed out while opening connection
[taskcluster 2016-07-07 02:59:40.345Z] Unsuccessful task run with exit code: -1 completed in 30.322 seconds
A more recent example:  https://tools.taskcluster.net/task-inspector/#TYJNGdeyQuWW0i11qbHygw/0

However, I can't get papertrail to show me that date range.  Greg, how would you go about that?

At a guess, this is docker having a hiccup.. so maybe retries will be the fix?
Flags: needinfo?(garndt)
Whiteboard: [docker-link-failure]
Here's the syslog for that particular task:

2016-09-07T22:26:38Z    i-09dc17b16cada9939      docker-worker   {"type":"task start","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-09dc17b16cada9939","workerGroup":"us-east-1a","workerType":"desktop-test-large","workerNodeType":"m3.large","taskId":"TYJNGdeyQuWW0i11qbHygw","runId":0}
2016-09-07T22:26:38Z    i-09dc17b16cada9939      docker-worker   {"type":"next claim","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-09dc17b16cada9939","workerGroup":"us-east-1a","workerType":"desktop-test-large","workerNodeType":"m3.large","time":922781.5384615384,"primaryTaskId":"TYJNGdeyQuWW0i11qbHygw","primaryRunId":0,"taskId":"TYJNGdeyQuWW0i11qbHygw","runId":0,"takenUntil":"2016-09-07T22:46:37.873Z"}
2016-09-07T22:26:38Z    i-09dc17b16cada9939      docker-worker   Wed, 07 Sep 2016 22:26:38 GMT docker-worker:states taskId: TYJNGdeyQuWW0i11qbHygw at state: link
2016-09-07T22:26:38Z    i-09dc17b16cada9939      docker-worker   Wed, 07 Sep 2016 22:26:38 GMT taskcluster-docker-worker:features:local_live_log create live log container...
2016-09-07T22:26:38Z    i-09dc17b16cada9939      docker-worker   Wed, 07 Sep 2016 22:26:38 GMT taskcluster-docker-worker:features:bulk_log Created BulkLog using tempfile: /tmp/1473287198264.2383
2016-09-07T22:26:38Z    i-09dc17b16cada9939      docker-worker   use taskcluster-lib-scopes instead of taskcluster-base.utils
2016-09-07T22:26:38Z    i-09dc17b16cada9939      docker-worker   Wed, 07 Sep 2016 22:26:38 GMT taskcluster-docker-worker:features:local_live_log ensuring image
2016-09-07T22:26:38Z    i-09dc17b16cada9939      docker-worker   {"type":"ensure image","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-09dc17b16cada9939","workerGroup":"us-east-1a","workerType":"desktop-test-large","workerNodeType":"m3.large","image":{"name":"taskcluster/relengapi-proxy:2.0.1","type":"docker-image"}}
2016-09-07T22:26:38Z    i-09dc17b16cada9939      docker-worker   Wed, 07 Sep 2016 22:26:38 GMT docker-worker:dockerImage pull image. Image: taskcluster/relengapi-proxy:2.0.1 Attempts: 1
2016-09-07T22:26:38Z    i-09dc17b16cada9939      docker-worker   b9583a207297 - Started downloading#015
2016-09-07T22:26:38Z    i-09dc17b16cada9939      docker-worker   b9583a207297 - Downloaded in 0 seconds#015
2016-09-07T22:26:38Z    i-09dc17b16cada9939      docker-worker   fb15e825cb68 - Started downloading#015
2016-09-07T22:26:38Z    i-09dc17b16cada9939      docker-worker   fb15e825cb68 - Downloaded in 0.006 seconds#015
2016-09-07T22:26:38Z    i-09dc17b16cada9939      docker-worker   9943fffae777 - Started downloading#015
2016-09-07T22:26:38Z    i-09dc17b16cada9939      docker-worker   a3ed95caeb02 - Started downloading#015
2016-09-07T22:26:38Z    i-09dc17b16cada9939      docker-worker   a3ed95caeb02 - Downloaded in 0 seconds#015
2016-09-07T22:26:38Z    i-09dc17b16cada9939      docker-worker   e9930c279a31 - Started downloading#015
2016-09-07T22:26:39Z    i-09dc17b16cada9939      docker-worker   ce90c3007ecd - Started downloading#015
2016-09-07T22:26:39Z    i-09dc17b16cada9939      docker-worker   ce90c3007ecd - Downloaded in 0.135 seconds#015
2016-09-07T22:26:39Z    i-09dc17b16cada9939      docker-worker   f71a10e04d18 - Started downloading#015
2016-09-07T22:26:39Z    i-09dc17b16cada9939      docker-worker   f71a10e04d18 - Downloaded in 0.117 seconds#015
2016-09-07T22:26:39Z    i-09dc17b16cada9939      docker-worker   e9930c279a31 - Downloaded in 0.833 seconds#015
2016-09-07T22:26:41Z    i-09dc17b16cada9939      docker-worker   9943fffae777 - Downloaded in 2.217 seconds#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   Digest: sha256:e4c2fc25fc5e1c48ba624446e0cc6061663f8e85a94496559b8658d0a3ae0b2f#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   Status: Downloaded newer image for taskcluster/relengapi-proxy:2.0.1#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   {"type":"ensure image","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-09dc17b16cada9939","workerGroup":"us-east-1a","workerType":"desktop-test-large","workerNodeType":"m3.large","image":{"name":"taskcluster/livelog:v3","type":"docker-image"}}
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   Wed, 07 Sep 2016 22:26:49 GMT docker-worker:dockerImage pull image. Image: taskcluster/livelog:v3 Attempts: 1
2016-09-07T22:26:49Z    i-09dc17b16cada9939      kernel  [   65.957915] aufs au_opts_parse:1155:docker[1856]: unknown option dirperm1
2016-09-07T22:26:49Z    i-09dc17b16cada9939      kernel  [   66.040142] device vethac13487 entered promiscuous mode
2016-09-07T22:26:49Z    i-09dc17b16cada9939      kernel  [   66.040310] IPv6: ADDRCONF(NETDEV_UP): vethac13487: link is not ready
2016-09-07T22:26:49Z    i-09dc17b16cada9939      kernel  [   66.040315] docker0: port 1(vethac13487) entered forwarding state
2016-09-07T22:26:49Z    i-09dc17b16cada9939      kernel  [   66.040320] docker0: port 1(vethac13487) entered forwarding state
2016-09-07T22:26:49Z    i-09dc17b16cada9939      kernel  [   66.040655] docker0: port 1(vethac13487) entered disabled state
2016-09-07T22:26:49Z    i-09dc17b16cada9939      kernel  [   66.131237] IPv6: ADDRCONF(NETDEV_CHANGE): vethac13487: link becomes ready
2016-09-07T22:26:49Z    i-09dc17b16cada9939      kernel  [   66.131275] docker0: port 1(vethac13487) entered forwarding state
2016-09-07T22:26:49Z    i-09dc17b16cada9939      kernel  [   66.131281] docker0: port 1(vethac13487) entered forwarding state
2016-09-07T22:26:49Z    i-09dc17b16cada9939      kernel  [   66.131297] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   77c6c00e8b61 - Started downloading#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   61b90c8094ba - Started downloading#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   61b90c8094ba - Downloaded in 0.001 seconds#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   a3ed95caeb02 - Started downloading#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   a3ed95caeb02 - Downloaded in 0.001 seconds#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   96b542aa9fae - Started downloading#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   96b542aa9fae - Downloaded in 0.001 seconds#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   77c6c00e8b61 - Downloaded in 0.177 seconds#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   91e17655d143 - Started downloading#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   91e17655d143 - Downloaded in 0.004 seconds#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   bdd6cb71a978 - Started downloading#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   bdd6cb71a978 - Downloaded in 0.001 seconds#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   03f992b49be8 - Started downloading#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   03f992b49be8 - Downloaded in 0.04 seconds#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   8a34723a0628 - Started downloading#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   8a34723a0628 - Downloaded in 0.008 seconds#015
2016-09-07T22:26:49Z    i-09dc17b16cada9939      docker-worker   c7114adcb737 - Started downloading#015
2016-09-07T22:26:50Z    i-09dc17b16cada9939      docker-worker   c7114adcb737 - Downloaded in 0.157 seconds#015
2016-09-07T22:26:51Z    i-09dc17b16cada9939      docker-worker   Digest: sha256:a45e5a4b4ef9c4b43cf4e115396c066fd00cf7ea8d80cde1c1d607bb42dc55aa#015
2016-09-07T22:26:51Z    i-09dc17b16cada9939      docker-worker   Status: Downloaded newer image for taskcluster/livelog:v3#015
2016-09-07T22:26:51Z    i-09dc17b16cada9939      docker-worker   Wed, 07 Sep 2016 22:26:51 GMT taskcluster-docker-worker:features:local_live_log image verified sha256:602e8afdbf6385f4fa160d795debc8cf1ec7553078725a4736243d48ac224fd8
2016-09-07T22:26:51Z    i-09dc17b16cada9939      kernel  [   68.447854] device veth58656df entered promiscuous mode
2016-09-07T22:26:51Z    i-09dc17b16cada9939      kernel  [   68.449093] IPv6: ADDRCONF(NETDEV_UP): veth58656df: link is not ready
2016-09-07T22:26:52Z    i-09dc17b16cada9939      kernel  [   68.661395] IPv6: ADDRCONF(NETDEV_CHANGE): veth58656df: link becomes ready
2016-09-07T22:26:52Z    i-09dc17b16cada9939      kernel  [   68.661451] docker0: port 2(veth58656df) entered forwarding state
2016-09-07T22:26:52Z    i-09dc17b16cada9939      kernel  [   68.661457] docker0: port 2(veth58656df) entered forwarding state
2016-09-07T22:26:52Z    i-09dc17b16cada9939      docker-worker   Wed, 07 Sep 2016 22:26:51 GMT taskcluster-docker-worker:features:local_live_log live log running: putUrl http://172.17.0.3:60022/log
2016-09-07T22:26:52Z    i-09dc17b16cada9939      docker-worker   Wed, 07 Sep 2016 22:26:51 GMT taskcluster-docker-worker:features:local_live_log live log running: publicUrl https://v2au6kqaaaavobxzqszzarjwqrcsezoqvfqfiscjqqmz5zq2.taskcluster-worker.net:32768/log/ZXHZ7PWvRouy4Z3GiIkxdQ
2016-09-07T22:27:04Z    i-09dc17b16cada9939      kernel  [   81.188070] docker0: port 1(vethac13487) entered forwarding state
2016-09-07T22:27:07Z    i-09dc17b16cada9939      docker-worker   {"type":"aliveness check","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-09dc17b16cada9939","workerGroup":"us-east-1a","workerType":"desktop-test-large","workerNodeType":"m3.large","alive":true,"uptime":83,"interval":30000}
2016-09-07T22:27:07Z    i-09dc17b16cada9939      kernel  [   83.680068] docker0: port 2(veth58656df) entered forwarding state
2016-09-07T22:27:19Z    i-09dc17b16cada9939      docker-worker   {"type":"task aborted","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-09dc17b16cada9939","workerGroup":"us-east-1a","workerType":"desktop-test-large","workerNodeType":"m3.large","taskId":"TYJNGdeyQuWW0i11qbHygw","runId":0,"exception":"","error":"\r\n[taskcluster:error] Task was aborted because states could not be created successfully. Error calling 'link' for relengAPIProxy : Failed to initialize releng API proxy service due to: Error: timed out while opening connection\r\n"}
Flags: needinfo?(garndt)
Removing ni?, spoke on IRC that papertrail searching through the UI only goes back 3 days.

Also, I noticed this from the new debug output I enabled (it's for livelog, but might also be the case for other sidecar containers starting):

Sep 14 02:38:42 docker-worker.aws-provisioner.us-east-1d.ami-acbfcfbb.m1-medium.i-03326461238f4eae0 kernel: [ 5350.981098] end_request: I/O error, dev xvdb, sector 536938976
Sep 14 02:38:42 docker-worker.aws-provisioner.us-east-1d.ami-acbfcfbb.m1-medium.i-03326461238f4eae0 kernel: [ 5350.981182] EXT4-fs warning (device dm-0): __ext4_read_dirblock:901: error reading directory block (ino 16777254, block 0)
For KKfUIJCaRxGGJ5AEIxr1ZA I don't see any such kernel errors:

Sep 13 11:07:04 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: use taskcluster-lib-scopes instead of taskcluster-base.utils
Sep 13 11:07:04 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Tue, 13 Sep 2016 16:07:04 GMT taskcluster-docker-worker:features:local_live_log ensuring image
Sep 13 11:07:04 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: {"type":"ensure image","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-0700024b6ef21cb55","workerGroup":"us-west-1b","workerType":"desktop-test-xlarge","workerNodeType":"m3.xlarge","image":{"name":"taskcluster/relengapi-proxy:2.0.1","type":"docker-image"}}
Sep 13 11:07:04 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Tue, 13 Sep 2016 16:07:04 GMT docker-worker:dockerImage pull image. Image: taskcluster/relengapi-proxy:2.0.1 Attempts: 1
Sep 13 11:07:05 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: b9583a207297 - Started downloading#015
Sep 13 11:07:05 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: b9583a207297 - Downloaded in 0.002 seconds#015
Sep 13 11:07:05 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: fb15e825cb68 - Started downloading#015
Sep 13 11:07:05 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: fb15e825cb68 - Downloaded in 0.002 seconds#015
Sep 13 11:07:05 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 9943fffae777 - Started downloading#015
Sep 13 11:07:06 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: a3ed95caeb02 - Started downloading#015
Sep 13 11:07:06 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: a3ed95caeb02 - Downloaded in 0.001 seconds#015
Sep 13 11:07:06 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: e9930c279a31 - Started downloading#015
Sep 13 11:07:06 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: ce90c3007ecd - Started downloading#015
Sep 13 11:07:06 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: ce90c3007ecd - Downloaded in 0.069 seconds#015
Sep 13 11:07:06 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: e9930c279a31 - Downloaded in 0.569 seconds#015
Sep 13 11:07:06 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: f71a10e04d18 - Started downloading#015
Sep 13 11:07:07 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: f71a10e04d18 - Downloaded in 0.056 seconds#015
Sep 13 11:07:07 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 9943fffae777 - Downloaded in 1.535 seconds#015
Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Digest: sha256:e4c2fc25fc5e1c48ba624446e0cc6061663f8e85a94496559b8658d0a3ae0b2f#015
Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Status: Downloaded newer image for taskcluster/relengapi-proxy:2.0.1#015
Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: {"type":"ensure image","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-0700024b6ef21cb55","workerGroup":"us-west-1b","workerType":"desktop-test-xlarge","workerNodeType":"m3.xlarge","image":{"name":"taskcluster/livelog:v3","type":"docker-image"}}
Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Tue, 13 Sep 2016 16:07:14 GMT docker-worker:dockerImage pull image. Image: taskcluster/livelog:v3 Attempts: 1
Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247874.955847] aufs au_opts_parse:1155:docker[1936]: unknown option dirperm1
Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247875.054902] device vethbe83752 entered promiscuous mode
Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247875.055033] IPv6: ADDRCONF(NETDEV_UP): vethbe83752: link is not ready
Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247875.055037] docker0: port 1(vethbe83752) entered forwarding state
Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247875.055042] docker0: port 1(vethbe83752) entered forwarding state
Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247875.055751] docker0: port 1(vethbe83752) entered disabled state
Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247875.117258] IPv6: ADDRCONF(NETDEV_CHANGE): vethbe83752: link becomes ready
Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247875.117296] docker0: port 1(vethbe83752) entered forwarding state
Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247875.117303] docker0: port 1(vethbe83752) entered forwarding state
Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247875.117319] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
Sep 13 11:07:15 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 61b90c8094ba - Started downloading#015
Sep 13 11:07:15 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 61b90c8094ba - Downloaded in 0.002 seconds#015
Sep 13 11:07:15 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 77c6c00e8b61 - Started downloading#015
Sep 13 11:07:15 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: a3ed95caeb02 - Started downloading#015
Sep 13 11:07:15 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: a3ed95caeb02 - Downloaded in 0 seconds#015
Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 77c6c00e8b61 - Downloaded in 0.076 seconds#015
Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 96b542aa9fae - Started downloading#015
Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 96b542aa9fae - Downloaded in 0 seconds#015
Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 91e17655d143 - Started downloading#015
Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 91e17655d143 - Downloaded in 0 seconds#015
Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: bdd6cb71a978 - Started downloading#015
Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: bdd6cb71a978 - Downloaded in 0.001 seconds#015
Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 03f992b49be8 - Started downloading#015
Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 03f992b49be8 - Downloaded in 0.006 seconds#015
Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 8a34723a0628 - Started downloading#015
Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 8a34723a0628 - Downloaded in 0.009 seconds#015
Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: c7114adcb737 - Started downloading#015
Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: c7114adcb737 - Downloaded in 0.079 seconds#015
Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Digest: sha256:a45e5a4b4ef9c4b43cf4e115396c066fd00cf7ea8d80cde1c1d607bb42dc55aa#015
Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Status: Downloaded newer image for taskcluster/livelog:v3#015
Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Tue, 13 Sep 2016 16:07:17 GMT taskcluster-docker-worker:features:local_live_log image verified sha256:602e8afdbf6385f4fa160d795debc8cf1ec7553078725a4736243d48ac224fd8
Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247877.905100] device veth05716d5 entered promiscuous mode
Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247877.905321] IPv6: ADDRCONF(NETDEV_UP): veth05716d5: link is not ready
Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247878.027766] IPv6: ADDRCONF(NETDEV_CHANGE): veth05716d5: link becomes ready
Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247878.027821] docker0: port 2(veth05716d5) entered forwarding state
Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247878.027828] docker0: port 2(veth05716d5) entered forwarding state
Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Tue, 13 Sep 2016 16:07:17 GMT taskcluster-docker-worker:features:local_live_log live log running: putUrl http://172.17.0.3:60022/log
Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Tue, 13 Sep 2016 16:07:17 GMT taskcluster-docker-worker:features:local_live_log live log running: publicUrl https://g233nciaaaavojhsbbefj57dsq7zi4qwlb5nfntupflwllc6.taskcluster-worker.net:32768/log/X51IcXG2R1Gyb-QhPODfMQ
Sep 13 11:07:29 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247890.161309] docker0: port 1(vethbe83752) entered forwarding state
Sep 13 11:07:32 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247893.039980] docker0: port 2(veth05716d5) entered forwarding state
Sep 13 11:07:32 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: {"type":"garbage collection started","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-0700024b6ef21cb55","workerGroup":"us-west-1b","workerType":"desktop-test-xlarge","workerNodeType":"m3.xlarge"}
Sep 13 11:07:32 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: {"type":"garbage collection finished","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-0700024b6ef21cb55","workerGroup":"us-west-1b","workerType":"desktop-test-xlarge","workerNodeType":"m3.xlarge"}
Sep 13 11:07:32 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: {"type":"aliveness check","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-0700024b6ef21cb55","workerGroup":"us-west-1b","workerType":"desktop-test-xlarge","workerNodeType":"m3.xlarge","alive":true,"uptime":174,"interval":30000}
Sep 13 11:07:44 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: {"type":"task aborted","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-0700024b6ef21cb55","workerGroup":"us-west-1b","workerType":"desktop-test-xlarge","workerNodeType":"m3.xlarge","taskId":"KKfUIJCaRxGGJ5AEIxr1ZA","runId":0,"exception":"","error":"\r\n[taskcluster:error] Task was aborted because states could not be created successfully. Error calling 'link' for relengAPIProxy : Failed to initialize releng API proxy service due to: Error: timed out while opening connection\r\n"} 

That same docker-worker instance later ran https://tools.taskcluster.net/task-inspector/#E9PtoaaXTkOrBdQSVEuu_g/ and https://tools.taskcluster.net/task-inspector/#VzTKwcU3SX2jlTvY2M06TQ/0 which both used the relengapi sidecar without any issues.

Could we run docker daemon with a higher --log-level, maybe?
We could, but not all of the docker logs are dumped here, a lot of them are dumped into /var/upstart/docker.log I think.
No longer blocks: 1307493
Depends on: 1307493
Whiteboard: [docker-link-failure] → [docker-link-failure][fennec-scouting]
Whiteboard: [docker-link-failure][fennec-scouting] → [docker-link-failure][fennec-scouting][stockwell-infra]
is this the docker sidecar work that we have seen in the past?
Flags: needinfo?(garndt)
This isn't with pulling the image like we've fixed by bundling the image in the AMI.  This is an issue where when the proxy tries to talking to the relengapi, there is a timeout.  garbas said there might be something at play with retries on the relengapi side that he needs to look into it.
Flags: needinfo?(garndt)
:garbas, this is our #1 intermittent (ignoring the virtualenv six issue that was temporary)- can you take a look at this?
Flags: needinfo?(rgarbas)
:dustin Would it be possible to add a retry logic (on connection failure etc...) like we did in Bug 1364695?
Flags: needinfo?(rgarbas) → needinfo?(dustin)
I assume you mean to retry the docker linking? I'm not sure - greg?
Flags: needinfo?(dustin) → needinfo?(garndt)
I meant the retry when opening the connection to the relengapi in go code.
I don't think that's the issue here - the problem is in starting up the proxy, which does not involve any communication with the relengapi service.  We don't fetch a temporary token until during the HTTP transaction of the first request to relengAPI, which occurs once the task has started executing.

 https://github.com/taskcluster/relengapi-proxy/blob/master/proxy.go#L61
Flags: needinfo?(garndt)
:dustin ok, sorry no idea how to read Go code. I was more comparing the error with Bug 1364695.
:garndt, is there anything to be done to reduce these frequent and consistent failures?
Flags: needinfo?(garndt)
This is probably caused by network blips.  Either the network for that machine is bad at the time, or just a blip contacting the external service (in this case relenapi).  The mitigation here (although might not be completely solution) is to try to restart the sidecar proxy container when this happens.
Flags: needinfo?(garndt)
Component: General → Docker-Worker
Assignee: nobody → wcosta
One potential place where things could timeout and not retry is here:
https://github.com/taskcluster/relengapi-proxy/blob/4f3e8febb0ff6b24ffe9da0aa47224c981b5ed64/relengapi.go#L46

It appears on startup, the relengapi proxy will retrieve a temp token, but there are not retries and backoff used here, so when it fails once, the task will be failed.
That doesn't happen on startup, it happens on each request when there is no active token available.
  https://github.com/taskcluster/relengapi-proxy/blob/77a48a4f2225086f13416a44972b3f7671ec50ba/proxy.go#L82
Thank you for the clarification. I overlooked that part.  so now even more confusing is why a simple container that just listens on port 80 is failing.  This should at least narrow down where to look hopefully.
Whiteboard: [docker-link-failure][fennec-scouting][stockwell disable-recommended] → [docker-link-failure][fennec-scouting][stockwell infra]
Wander has a patch for docker-worker that we can try to roll out this week to help retrying these failures.
this has been high frequency this week, :wcosta, can you link to the bug/code for the patch that might help with this?
Flags: needinfo?(wcosta)
Fix was implemented and merged here:
https://github.com/taskcluster/docker-worker/pull/315

But we are working at resolving this PR before deploying:
https://github.com/taskcluster/docker-worker/pull/316

Looks like all of the issues reported recently were related to the relengapi proxy starting up.
Flags: needinfo?(wcosta)
Bug 1307493 has been rolled out which should alleviate most of these issues.  No issue logged in 8 days, closing out.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Component: Docker-Worker → Workers
You need to log in before you can comment on or make changes to this bug.