Closed Bug 1445697 Opened 7 years ago Closed 6 years ago

Workers getting 403's renewing statsum/signalfx creds

Categories

(Taskcluster :: Workers, enhancement, P1)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Unassigned)

References

Details

We're seeing lots of these, causing lots of alerts about excessive 403's. docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 for example: https://papertrailapp.com/systems/1687026791/events?focus=910560469327179819 startup: Mar 14 12:49:13 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="1496" x-info="http://www.rsyslog.com"] start ... Mar 14 12:49:41 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 docker-worker: {"type":"configure","source":"host/aws","url":"http://169.254.169.254/latest"} takes a task: Mar 14 12:49:44 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 docker-worker: {"type":"task start","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-066bcbd52ede5fb37","workerGroup":"us-west-1","workerType":"gecko-t-linux-large","workerNodeType":"m3.large","taskId":"Re3EkNjpT0Ocb8qvdsEaeg","runId":0} ... Mar 14 13:17:54 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 docker-worker: {"type":"task resolved","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-066bcbd52ede5fb37","workerGroup":"us-west-1","workerType":"gecko-t-linux-large","workerNodeType":"m3.large","taskId":"Re3EkNjpT0Ocb8qvdsEaeg","runId":0,"taskState":"completed"} takes a task: Mar 14 13:20:52 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 docker-worker: {"type":"task start","source":"top","provisionerId":"aws-provisioner-v1","workerId":"i-066bcbd52ede5fb37","workerGroup":"us-west-1","workerType":"gecko-t-linux-large","workerNodeType":"m3.large","taskId":"GkElnORJT7ul6Vc4ilkjww","runId":1} OOM: Mar 14 13:27:12 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 docker-worker: Uncaught Exception! Attempting to report to Sentry and crash. Mar 14 13:27:13 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 docker-worker: Error: spawn ENOMEM ... Mar 14 13:29:14 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 kernel: [ 2409.357896] Out of memory: Kill process 11672 (firefox) score 869 or sacrifice child Mar 14 13:29:14 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 kernel: [ 2409.362560] Killed process 11725 (Web Content) total-vm:1649800kB, anon-rss:42604kB, file-rss:7268kB Mar 14 13:29:15 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 docker-worker: Failed to report error to Sentry after timeout! and here's the issue, I think: Mar 14 13:29:15 docker-worker.aws-provisioner.us-west-1c.ami-cc909aac.m3-large.i-066bcbd52ede5fb37 kernel: [ 2410.534578] init: docker-worker main process ended, respawning So that new process can't get any credentials, since the secret has been deleted. Should we just shut down in this case, instead?
Blocks: tc-stability
Priority: -- → P1
Component: Operations → Docker-Worker
Wander, it seems like hte underlying issue is that docker-worker restarted on the host. I think we've decided to just live with that problem? If so, let's WONTFIX..
Flags: needinfo?(wcosta)
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(wcosta)
Resolution: --- → FIXED
Component: Docker-Worker → Workers
You need to log in before you can comment on or make changes to this bug.