Closed
Bug 1445697
Opened 7 years ago
Closed 6 years ago
Workers getting 403's renewing statsum/signalfx creds
Categories
(Taskcluster :: Workers, enhancement, P1)
Taskcluster
Workers
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Unassigned)
References
Details
We're seeing lots of these, causing lots of alerts about excessive 403's.
docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 for example:
https://papertrailapp.com/systems/1687026791/events?focus=910560469327179819
startup:
Mar 14 12:49:13 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="1496" x-info="http://www.rsyslog.com"] start
...
Mar 14 12:49:41 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 docker-worker: {"type":"configure","source":"host/aws","url":"http://169.254.169.254/latest"}
takes a task:
Mar 14 12:49:44 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 docker-worker: {"type":"task start","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-066bcbd52ede5fb37","workerGroup":"us-west-1","workerType":"gecko-t-linux-large","workerNodeType":"m3.large","taskId":"Re3EkNjpT0Ocb8qvdsEaeg","runId":0}
...
Mar 14 13:17:54 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 docker-worker: {"type":"task resolved","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-066bcbd52ede5fb37","workerGroup":"us-west-1","workerType":"gecko-t-linux-large","workerNodeType":"m3.large","taskId":"Re3EkNjpT0Ocb8qvdsEaeg","runId":0,"taskState":"completed"}
takes a task:
Mar 14 13:20:52 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 docker-worker: {"type":"task start","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-066bcbd52ede5fb37","workerGroup":"us-west-1","workerType":"gecko-t-linux-large","workerNodeType":"m3.large","taskId":"GkElnORJT7ul6Vc4ilkjww","runId":1}
OOM:
Mar 14 13:27:12 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 docker-worker: Uncaught Exception! Attempting to report to Sentry and crash.
Mar 14 13:27:13 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 docker-worker: Error: spawn ENOMEM
...
Mar 14 13:29:14 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 kernel: [ 2409.357896] Out of memory: Kill process 11672 (firefox) score 869 or sacrifice child
Mar 14 13:29:14 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 kernel: [ 2409.362560] Killed process 11725 (Web Content) total-vm:1649800kB, anon-rss:42604kB, file-rss:7268kB
Mar 14 13:29:15 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 docker-worker: Failed to report error to Sentry after timeout!
and here's the issue, I think:
Mar 14 13:29:15 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 kernel: [ 2410.534578] init: docker-worker main process ended, respawning
So that new process can't get any credentials, since the secret has been deleted.
Should we just shut down in this case, instead?
Updated•7 years ago
|
Blocks: tc-stability
Priority: -- → P1
Updated•7 years ago
|
Component: Operations → Docker-Worker
Reporter | ||
Comment 1•6 years ago
|
||
Wander, it seems like hte underlying issue is that docker-worker restarted on the host. I think we've decided to just live with that problem? If so, let's WONTFIX..
Flags: needinfo?(wcosta)
Comment 2•6 years ago
|
||
We now shut down on this error [1].
[1] https://github.com/taskcluster/docker-worker/blob/master/src/lib/host/aws.js#L136
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(wcosta)
Resolution: --- → FIXED
Assignee | ||
Updated•6 years ago
|
Component: Docker-Worker → Workers
You need to log in
before you can comment on or make changes to this bug.
Description
•