Closed
Bug 1285090
Opened 8 years ago
Closed 7 years ago
Intermittent [taskcluster:error] Task was aborted because states could not be created successfully. Error calling 'link' for relengAPIProxy : Failed to initialize releng API proxy service due to: Error: timed out while opening connection
Categories
(Taskcluster :: Workers, defect)
Taskcluster
Workers
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: intermittent-bug-filer, Assigned: wcosta)
References
Details
(Keywords: intermittent-failure, Whiteboard: [docker-link-failure][fennec-scouting][stockwell infra])
Filed by: philringnalda@gmail.com https://treeherder.mozilla.org/logviewer.html#?job_id=31317553&repo=mozilla-inbound https://queue.taskcluster.net/v1/task/WH6lxISlSN2SBpVZvcwyVw/runs/0/artifacts/public%2Flogs%2Flive_backing.log
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 9•8 years ago
|
||
[taskcluster 2016-07-07 02:59:10.024Z] Task ID: WH6lxISlSN2SBpVZvcwyVw [taskcluster 2016-07-07 02:59:10.024Z] Worker ID: i-0e4a7c5a8b5039138 [taskcluster 2016-07-07 02:59:10.024Z] Worker Group: us-west-2a [taskcluster 2016-07-07 02:59:10.024Z] Worker Type: opt-linux64 [taskcluster 2016-07-07 02:59:10.024Z] Public IP: 54.149.193.49 [taskcluster 2016-07-07 02:59:10.025Z] using cache "level-3-mozilla-inbound-tc-vcs" -> /home/worker/.tc-vcs [taskcluster 2016-07-07 02:59:10.025Z] using cache "level-3-mozilla-inbound-build-linux64-artifact-workspace" -> /home/worker/workspace [taskcluster 2016-07-07 02:59:10.025Z] using cache "tooltool-cache" -> /home/worker/tooltool-cache [taskcluster:error] Task was aborted because states could not be created successfully. Error calling 'link' for relengAPIProxy : Failed to initialize releng API proxy service due to: Error: timed out while opening connection [taskcluster 2016-07-07 02:59:40.345Z] Unsuccessful task run with exit code: -1 completed in 30.322 seconds
Comment 10•8 years ago
|
||
A more recent example: https://tools.taskcluster.net/task-inspector/#TYJNGdeyQuWW0i11qbHygw/0 However, I can't get papertrail to show me that date range. Greg, how would you go about that? At a guess, this is docker having a hiccup.. so maybe retries will be the fix?
Flags: needinfo?(garndt)
Updated•8 years ago
|
Whiteboard: [docker-link-failure]
Comment 11•8 years ago
|
||
Here's the syslog for that particular task: 2016-09-07T22:26:38Z i-09dc17b16cada9939 docker-worker {"type":"task start","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-09dc17b16cada9939","workerGroup":"us-east-1a","workerType":"desktop-test-large","workerNodeType":"m3.large","taskId":"TYJNGdeyQuWW0i11qbHygw","runId":0} 2016-09-07T22:26:38Z i-09dc17b16cada9939 docker-worker {"type":"next claim","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-09dc17b16cada9939","workerGroup":"us-east-1a","workerType":"desktop-test-large","workerNodeType":"m3.large","time":922781.5384615384,"primaryTaskId":"TYJNGdeyQuWW0i11qbHygw","primaryRunId":0,"taskId":"TYJNGdeyQuWW0i11qbHygw","runId":0,"takenUntil":"2016-09-07T22:46:37.873Z"} 2016-09-07T22:26:38Z i-09dc17b16cada9939 docker-worker Wed, 07 Sep 2016 22:26:38 GMT docker-worker:states taskId: TYJNGdeyQuWW0i11qbHygw at state: link 2016-09-07T22:26:38Z i-09dc17b16cada9939 docker-worker Wed, 07 Sep 2016 22:26:38 GMT taskcluster-docker-worker:features:local_live_log create live log container... 2016-09-07T22:26:38Z i-09dc17b16cada9939 docker-worker Wed, 07 Sep 2016 22:26:38 GMT taskcluster-docker-worker:features:bulk_log Created BulkLog using tempfile: /tmp/1473287198264.2383 2016-09-07T22:26:38Z i-09dc17b16cada9939 docker-worker use taskcluster-lib-scopes instead of taskcluster-base.utils 2016-09-07T22:26:38Z i-09dc17b16cada9939 docker-worker Wed, 07 Sep 2016 22:26:38 GMT taskcluster-docker-worker:features:local_live_log ensuring image 2016-09-07T22:26:38Z i-09dc17b16cada9939 docker-worker {"type":"ensure image","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-09dc17b16cada9939","workerGroup":"us-east-1a","workerType":"desktop-test-large","workerNodeType":"m3.large","image":{"name":"taskcluster/relengapi-proxy:2.0.1","type":"docker-image"}} 2016-09-07T22:26:38Z i-09dc17b16cada9939 docker-worker Wed, 07 Sep 2016 22:26:38 GMT docker-worker:dockerImage pull image. Image: taskcluster/relengapi-proxy:2.0.1 Attempts: 1 2016-09-07T22:26:38Z i-09dc17b16cada9939 docker-worker b9583a207297 - Started downloading#015 2016-09-07T22:26:38Z i-09dc17b16cada9939 docker-worker b9583a207297 - Downloaded in 0 seconds#015 2016-09-07T22:26:38Z i-09dc17b16cada9939 docker-worker fb15e825cb68 - Started downloading#015 2016-09-07T22:26:38Z i-09dc17b16cada9939 docker-worker fb15e825cb68 - Downloaded in 0.006 seconds#015 2016-09-07T22:26:38Z i-09dc17b16cada9939 docker-worker 9943fffae777 - Started downloading#015 2016-09-07T22:26:38Z i-09dc17b16cada9939 docker-worker a3ed95caeb02 - Started downloading#015 2016-09-07T22:26:38Z i-09dc17b16cada9939 docker-worker a3ed95caeb02 - Downloaded in 0 seconds#015 2016-09-07T22:26:38Z i-09dc17b16cada9939 docker-worker e9930c279a31 - Started downloading#015 2016-09-07T22:26:39Z i-09dc17b16cada9939 docker-worker ce90c3007ecd - Started downloading#015 2016-09-07T22:26:39Z i-09dc17b16cada9939 docker-worker ce90c3007ecd - Downloaded in 0.135 seconds#015 2016-09-07T22:26:39Z i-09dc17b16cada9939 docker-worker f71a10e04d18 - Started downloading#015 2016-09-07T22:26:39Z i-09dc17b16cada9939 docker-worker f71a10e04d18 - Downloaded in 0.117 seconds#015 2016-09-07T22:26:39Z i-09dc17b16cada9939 docker-worker e9930c279a31 - Downloaded in 0.833 seconds#015 2016-09-07T22:26:41Z i-09dc17b16cada9939 docker-worker 9943fffae777 - Downloaded in 2.217 seconds#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker Digest: sha256:e4c2fc25fc5e1c48ba624446e0cc6061663f8e85a94496559b8658d0a3ae0b2f#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker Status: Downloaded newer image for taskcluster/relengapi-proxy:2.0.1#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker {"type":"ensure image","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-09dc17b16cada9939","workerGroup":"us-east-1a","workerType":"desktop-test-large","workerNodeType":"m3.large","image":{"name":"taskcluster/livelog:v3","type":"docker-image"}} 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker Wed, 07 Sep 2016 22:26:49 GMT docker-worker:dockerImage pull image. Image: taskcluster/livelog:v3 Attempts: 1 2016-09-07T22:26:49Z i-09dc17b16cada9939 kernel [ 65.957915] aufs au_opts_parse:1155:docker[1856]: unknown option dirperm1 2016-09-07T22:26:49Z i-09dc17b16cada9939 kernel [ 66.040142] device vethac13487 entered promiscuous mode 2016-09-07T22:26:49Z i-09dc17b16cada9939 kernel [ 66.040310] IPv6: ADDRCONF(NETDEV_UP): vethac13487: link is not ready 2016-09-07T22:26:49Z i-09dc17b16cada9939 kernel [ 66.040315] docker0: port 1(vethac13487) entered forwarding state 2016-09-07T22:26:49Z i-09dc17b16cada9939 kernel [ 66.040320] docker0: port 1(vethac13487) entered forwarding state 2016-09-07T22:26:49Z i-09dc17b16cada9939 kernel [ 66.040655] docker0: port 1(vethac13487) entered disabled state 2016-09-07T22:26:49Z i-09dc17b16cada9939 kernel [ 66.131237] IPv6: ADDRCONF(NETDEV_CHANGE): vethac13487: link becomes ready 2016-09-07T22:26:49Z i-09dc17b16cada9939 kernel [ 66.131275] docker0: port 1(vethac13487) entered forwarding state 2016-09-07T22:26:49Z i-09dc17b16cada9939 kernel [ 66.131281] docker0: port 1(vethac13487) entered forwarding state 2016-09-07T22:26:49Z i-09dc17b16cada9939 kernel [ 66.131297] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker 77c6c00e8b61 - Started downloading#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker 61b90c8094ba - Started downloading#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker 61b90c8094ba - Downloaded in 0.001 seconds#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker a3ed95caeb02 - Started downloading#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker a3ed95caeb02 - Downloaded in 0.001 seconds#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker 96b542aa9fae - Started downloading#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker 96b542aa9fae - Downloaded in 0.001 seconds#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker 77c6c00e8b61 - Downloaded in 0.177 seconds#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker 91e17655d143 - Started downloading#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker 91e17655d143 - Downloaded in 0.004 seconds#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker bdd6cb71a978 - Started downloading#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker bdd6cb71a978 - Downloaded in 0.001 seconds#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker 03f992b49be8 - Started downloading#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker 03f992b49be8 - Downloaded in 0.04 seconds#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker 8a34723a0628 - Started downloading#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker 8a34723a0628 - Downloaded in 0.008 seconds#015 2016-09-07T22:26:49Z i-09dc17b16cada9939 docker-worker c7114adcb737 - Started downloading#015 2016-09-07T22:26:50Z i-09dc17b16cada9939 docker-worker c7114adcb737 - Downloaded in 0.157 seconds#015 2016-09-07T22:26:51Z i-09dc17b16cada9939 docker-worker Digest: sha256:a45e5a4b4ef9c4b43cf4e115396c066fd00cf7ea8d80cde1c1d607bb42dc55aa#015 2016-09-07T22:26:51Z i-09dc17b16cada9939 docker-worker Status: Downloaded newer image for taskcluster/livelog:v3#015 2016-09-07T22:26:51Z i-09dc17b16cada9939 docker-worker Wed, 07 Sep 2016 22:26:51 GMT taskcluster-docker-worker:features:local_live_log image verified sha256:602e8afdbf6385f4fa160d795debc8cf1ec7553078725a4736243d48ac224fd8 2016-09-07T22:26:51Z i-09dc17b16cada9939 kernel [ 68.447854] device veth58656df entered promiscuous mode 2016-09-07T22:26:51Z i-09dc17b16cada9939 kernel [ 68.449093] IPv6: ADDRCONF(NETDEV_UP): veth58656df: link is not ready 2016-09-07T22:26:52Z i-09dc17b16cada9939 kernel [ 68.661395] IPv6: ADDRCONF(NETDEV_CHANGE): veth58656df: link becomes ready 2016-09-07T22:26:52Z i-09dc17b16cada9939 kernel [ 68.661451] docker0: port 2(veth58656df) entered forwarding state 2016-09-07T22:26:52Z i-09dc17b16cada9939 kernel [ 68.661457] docker0: port 2(veth58656df) entered forwarding state 2016-09-07T22:26:52Z i-09dc17b16cada9939 docker-worker Wed, 07 Sep 2016 22:26:51 GMT taskcluster-docker-worker:features:local_live_log live log running: putUrl http://172.17.0.3:60022/log 2016-09-07T22:26:52Z i-09dc17b16cada9939 docker-worker Wed, 07 Sep 2016 22:26:51 GMT taskcluster-docker-worker:features:local_live_log live log running: publicUrl https://v2au6kqaaaavobxzqszzarjwqrcsezoqvfqfiscjqqmz5zq2.taskcluster-worker.net:32768/log/ZXHZ7PWvRouy4Z3GiIkxdQ 2016-09-07T22:27:04Z i-09dc17b16cada9939 kernel [ 81.188070] docker0: port 1(vethac13487) entered forwarding state 2016-09-07T22:27:07Z i-09dc17b16cada9939 docker-worker {"type":"aliveness check","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-09dc17b16cada9939","workerGroup":"us-east-1a","workerType":"desktop-test-large","workerNodeType":"m3.large","alive":true,"uptime":83,"interval":30000} 2016-09-07T22:27:07Z i-09dc17b16cada9939 kernel [ 83.680068] docker0: port 2(veth58656df) entered forwarding state 2016-09-07T22:27:19Z i-09dc17b16cada9939 docker-worker {"type":"task aborted","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-09dc17b16cada9939","workerGroup":"us-east-1a","workerType":"desktop-test-large","workerNodeType":"m3.large","taskId":"TYJNGdeyQuWW0i11qbHygw","runId":0,"exception":"","error":"\r\n[taskcluster:error] Task was aborted because states could not be created successfully. Error calling 'link' for relengAPIProxy : Failed to initialize releng API proxy service due to: Error: timed out while opening connection\r\n"}
Flags: needinfo?(garndt)
Comment 12•8 years ago
|
||
Removing ni?, spoke on IRC that papertrail searching through the UI only goes back 3 days. Also, I noticed this from the new debug output I enabled (it's for livelog, but might also be the case for other sidecar containers starting): Sep 14 02:38:42 docker-worker.aws-provisioner.us-east-1d.ami-acbfcfbb.m1-medium.i-03326461238f4eae0 kernel: [ 5350.981098] end_request: I/O error, dev xvdb, sector 536938976 Sep 14 02:38:42 docker-worker.aws-provisioner.us-east-1d.ami-acbfcfbb.m1-medium.i-03326461238f4eae0 kernel: [ 5350.981182] EXT4-fs warning (device dm-0): __ext4_read_dirblock:901: error reading directory block (ino 16777254, block 0)
Comment 13•8 years ago
|
||
For KKfUIJCaRxGGJ5AEIxr1ZA I don't see any such kernel errors: Sep 13 11:07:04 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: use taskcluster-lib-scopes instead of taskcluster-base.utils Sep 13 11:07:04 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Tue, 13 Sep 2016 16:07:04 GMT taskcluster-docker-worker:features:local_live_log ensuring image Sep 13 11:07:04 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: {"type":"ensure image","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-0700024b6ef21cb55","workerGroup":"us-west-1b","workerType":"desktop-test-xlarge","workerNodeType":"m3.xlarge","image":{"name":"taskcluster/relengapi-proxy:2.0.1","type":"docker-image"}} Sep 13 11:07:04 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Tue, 13 Sep 2016 16:07:04 GMT docker-worker:dockerImage pull image. Image: taskcluster/relengapi-proxy:2.0.1 Attempts: 1 Sep 13 11:07:05 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: b9583a207297 - Started downloading#015 Sep 13 11:07:05 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: b9583a207297 - Downloaded in 0.002 seconds#015 Sep 13 11:07:05 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: fb15e825cb68 - Started downloading#015 Sep 13 11:07:05 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: fb15e825cb68 - Downloaded in 0.002 seconds#015 Sep 13 11:07:05 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 9943fffae777 - Started downloading#015 Sep 13 11:07:06 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: a3ed95caeb02 - Started downloading#015 Sep 13 11:07:06 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: a3ed95caeb02 - Downloaded in 0.001 seconds#015 Sep 13 11:07:06 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: e9930c279a31 - Started downloading#015 Sep 13 11:07:06 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: ce90c3007ecd - Started downloading#015 Sep 13 11:07:06 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: ce90c3007ecd - Downloaded in 0.069 seconds#015 Sep 13 11:07:06 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: e9930c279a31 - Downloaded in 0.569 seconds#015 Sep 13 11:07:06 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: f71a10e04d18 - Started downloading#015 Sep 13 11:07:07 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: f71a10e04d18 - Downloaded in 0.056 seconds#015 Sep 13 11:07:07 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 9943fffae777 - Downloaded in 1.535 seconds#015 Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Digest: sha256:e4c2fc25fc5e1c48ba624446e0cc6061663f8e85a94496559b8658d0a3ae0b2f#015 Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Status: Downloaded newer image for taskcluster/relengapi-proxy:2.0.1#015 Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: {"type":"ensure image","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-0700024b6ef21cb55","workerGroup":"us-west-1b","workerType":"desktop-test-xlarge","workerNodeType":"m3.xlarge","image":{"name":"taskcluster/livelog:v3","type":"docker-image"}} Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Tue, 13 Sep 2016 16:07:14 GMT docker-worker:dockerImage pull image. Image: taskcluster/livelog:v3 Attempts: 1 Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247874.955847] aufs au_opts_parse:1155:docker[1936]: unknown option dirperm1 Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247875.054902] device vethbe83752 entered promiscuous mode Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247875.055033] IPv6: ADDRCONF(NETDEV_UP): vethbe83752: link is not ready Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247875.055037] docker0: port 1(vethbe83752) entered forwarding state Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247875.055042] docker0: port 1(vethbe83752) entered forwarding state Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247875.055751] docker0: port 1(vethbe83752) entered disabled state Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247875.117258] IPv6: ADDRCONF(NETDEV_CHANGE): vethbe83752: link becomes ready Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247875.117296] docker0: port 1(vethbe83752) entered forwarding state Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247875.117303] docker0: port 1(vethbe83752) entered forwarding state Sep 13 11:07:14 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247875.117319] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready Sep 13 11:07:15 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 61b90c8094ba - Started downloading#015 Sep 13 11:07:15 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 61b90c8094ba - Downloaded in 0.002 seconds#015 Sep 13 11:07:15 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 77c6c00e8b61 - Started downloading#015 Sep 13 11:07:15 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: a3ed95caeb02 - Started downloading#015 Sep 13 11:07:15 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: a3ed95caeb02 - Downloaded in 0 seconds#015 Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 77c6c00e8b61 - Downloaded in 0.076 seconds#015 Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 96b542aa9fae - Started downloading#015 Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 96b542aa9fae - Downloaded in 0 seconds#015 Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 91e17655d143 - Started downloading#015 Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 91e17655d143 - Downloaded in 0 seconds#015 Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: bdd6cb71a978 - Started downloading#015 Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: bdd6cb71a978 - Downloaded in 0.001 seconds#015 Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 03f992b49be8 - Started downloading#015 Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 03f992b49be8 - Downloaded in 0.006 seconds#015 Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 8a34723a0628 - Started downloading#015 Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: 8a34723a0628 - Downloaded in 0.009 seconds#015 Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: c7114adcb737 - Started downloading#015 Sep 13 11:07:16 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: c7114adcb737 - Downloaded in 0.079 seconds#015 Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Digest: sha256:a45e5a4b4ef9c4b43cf4e115396c066fd00cf7ea8d80cde1c1d607bb42dc55aa#015 Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Status: Downloaded newer image for taskcluster/livelog:v3#015 Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Tue, 13 Sep 2016 16:07:17 GMT taskcluster-docker-worker:features:local_live_log image verified sha256:602e8afdbf6385f4fa160d795debc8cf1ec7553078725a4736243d48ac224fd8 Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247877.905100] device veth05716d5 entered promiscuous mode Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247877.905321] IPv6: ADDRCONF(NETDEV_UP): veth05716d5: link is not ready Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247878.027766] IPv6: ADDRCONF(NETDEV_CHANGE): veth05716d5: link becomes ready Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247878.027821] docker0: port 2(veth05716d5) entered forwarding state Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247878.027828] docker0: port 2(veth05716d5) entered forwarding state Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Tue, 13 Sep 2016 16:07:17 GMT taskcluster-docker-worker:features:local_live_log live log running: putUrl http://172.17.0.3:60022/log Sep 13 11:07:17 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: Tue, 13 Sep 2016 16:07:17 GMT taskcluster-docker-worker:features:local_live_log live log running: publicUrl https://g233nciaaaavojhsbbefj57dsq7zi4qwlb5nfntupflwllc6.taskcluster-worker.net:32768/log/X51IcXG2R1Gyb-QhPODfMQ Sep 13 11:07:29 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247890.161309] docker0: port 1(vethbe83752) entered forwarding state Sep 13 11:07:32 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 kernel: [36247893.039980] docker0: port 2(veth05716d5) entered forwarding state Sep 13 11:07:32 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: {"type":"garbage collection started","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-0700024b6ef21cb55","workerGroup":"us-west-1b","workerType":"desktop-test-xlarge","workerNodeType":"m3.xlarge"} Sep 13 11:07:32 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: {"type":"garbage collection finished","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-0700024b6ef21cb55","workerGroup":"us-west-1b","workerType":"desktop-test-xlarge","workerNodeType":"m3.xlarge"} Sep 13 11:07:32 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: {"type":"aliveness check","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-0700024b6ef21cb55","workerGroup":"us-west-1b","workerType":"desktop-test-xlarge","workerNodeType":"m3.xlarge","alive":true,"uptime":174,"interval":30000} Sep 13 11:07:44 docker-worker.aws-provisioner.us-west-1b.ami-ecf5b98c.m3-xlarge.i-0700024b6ef21cb55 docker-worker: {"type":"task aborted","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-0700024b6ef21cb55","workerGroup":"us-west-1b","workerType":"desktop-test-xlarge","workerNodeType":"m3.xlarge","taskId":"KKfUIJCaRxGGJ5AEIxr1ZA","runId":0,"exception":"","error":"\r\n[taskcluster:error] Task was aborted because states could not be created successfully. Error calling 'link' for relengAPIProxy : Failed to initialize releng API proxy service due to: Error: timed out while opening connection\r\n"} That same docker-worker instance later ran https://tools.taskcluster.net/task-inspector/#E9PtoaaXTkOrBdQSVEuu_g/ and https://tools.taskcluster.net/task-inspector/#VzTKwcU3SX2jlTvY2M06TQ/0 which both used the relengapi sidecar without any issues. Could we run docker daemon with a higher --log-level, maybe?
Comment 14•8 years ago
|
||
We could, but not all of the docker logs are dumped here, a lot of them are dumped into /var/upstart/docker.log I think.
Comment hidden (Intermittent Failures Robot) |
Updated•8 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Whiteboard: [docker-link-failure] → [docker-link-failure][fennec-scouting]
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Whiteboard: [docker-link-failure][fennec-scouting] → [docker-link-failure][fennec-scouting][stockwell-infra]
Comment 37•7 years ago
|
||
is this the docker sidecar work that we have seen in the past?
Flags: needinfo?(garndt)
Comment 38•7 years ago
|
||
This isn't with pulling the image like we've fixed by bundling the image in the AMI. This is an issue where when the proxy tries to talking to the relengapi, there is a timeout. garbas said there might be something at play with retries on the relengapi side that he needs to look into it.
Flags: needinfo?(garndt)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 41•7 years ago
|
||
:garbas, this is our #1 intermittent (ignoring the virtualenv six issue that was temporary)- can you take a look at this?
Flags: needinfo?(rgarbas)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 45•7 years ago
|
||
:dustin Would it be possible to add a retry logic (on connection failure etc...) like we did in Bug 1364695?
Flags: needinfo?(rgarbas) → needinfo?(dustin)
Comment 46•7 years ago
|
||
I assume you mean to retry the docker linking? I'm not sure - greg?
Flags: needinfo?(dustin) → needinfo?(garndt)
Comment 47•7 years ago
|
||
I meant the retry when opening the connection to the relengapi in go code.
Comment 48•7 years ago
|
||
I don't think that's the issue here - the problem is in starting up the proxy, which does not involve any communication with the relengapi service. We don't fetch a temporary token until during the HTTP transaction of the first request to relengAPI, which occurs once the task has started executing. https://github.com/taskcluster/relengapi-proxy/blob/master/proxy.go#L61
Flags: needinfo?(garndt)
Comment 49•7 years ago
|
||
:dustin ok, sorry no idea how to read Go code. I was more comparing the error with Bug 1364695.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 60•7 years ago
|
||
:garndt, is there anything to be done to reduce these frequent and consistent failures?
Flags: needinfo?(garndt)
Comment 61•7 years ago
|
||
This is probably caused by network blips. Either the network for that machine is bad at the time, or just a blip contacting the external service (in this case relenapi). The mitigation here (although might not be completely solution) is to try to restart the sidecar proxy container when this happens.
Flags: needinfo?(garndt)
Updated•7 years ago
|
Component: General → Docker-Worker
Updated•7 years ago
|
Assignee: nobody → wcosta
Comment hidden (Intermittent Failures Robot) |
Comment 63•7 years ago
|
||
One potential place where things could timeout and not retry is here: https://github.com/taskcluster/relengapi-proxy/blob/4f3e8febb0ff6b24ffe9da0aa47224c981b5ed64/relengapi.go#L46 It appears on startup, the relengapi proxy will retrieve a temp token, but there are not retries and backoff used here, so when it fails once, the task will be failed.
Comment 64•7 years ago
|
||
That doesn't happen on startup, it happens on each request when there is no active token available. https://github.com/taskcluster/relengapi-proxy/blob/77a48a4f2225086f13416a44972b3f7671ec50ba/proxy.go#L82
Comment 65•7 years ago
|
||
Thank you for the clarification. I overlooked that part. so now even more confusing is why a simple container that just listens on port 80 is failing. This should at least narrow down where to look hopefully.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Whiteboard: [docker-link-failure][fennec-scouting][stockwell disable-recommended] → [docker-link-failure][fennec-scouting][stockwell infra]
Comment hidden (Intermittent Failures Robot) |
Comment 76•7 years ago
|
||
Wander has a patch for docker-worker that we can try to roll out this week to help retrying these failures.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 79•7 years ago
|
||
this has been high frequency this week, :wcosta, can you link to the bug/code for the patch that might help with this?
Flags: needinfo?(wcosta)
Comment 80•7 years ago
|
||
Fix was implemented and merged here: https://github.com/taskcluster/docker-worker/pull/315 But we are working at resolving this PR before deploying: https://github.com/taskcluster/docker-worker/pull/316 Looks like all of the issues reported recently were related to the relengapi proxy starting up.
Flags: needinfo?(wcosta)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 85•7 years ago
|
||
Bug 1307493 has been rolled out which should alleviate most of these issues. No issue logged in 8 days, closing out.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Updated•5 years ago
|
Component: Docker-Worker → Workers
You need to log in
before you can comment on or make changes to this bug.
Description
•