Closed Bug 1243862 Opened 9 years ago Closed 9 years ago

Provisioner failed to complete provisioning iteration

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: garndt, Unassigned)

Details

At 19:28 UTC the provisioner started a provisioning iteration that never completed. After 25 minutes I restarted the provisioner but failed to save the heroku logs. When I was skimming the logs I didn't see any crashes that jumped out. Mostly it was just the provisioner reporting stats to influx. Dead man snitch didn't report the provisioner as not reporting in. This was noticed because a monitor for some influx stats threw an alert when no iterations were logged for more than 10 minutes. Also, around the same time we had some ec2 instances failing to pull files from s3. This is probably purely coincidental but thought I would mention it since there seemed to be a different AWS hiccup around the same time.
Flags: needinfo?(jhford)
I'm not sure if it's an issue with me receiving emails or dead man snitch sending them but I didn't receive the email that the provisioner failed to check in, but I did receive an email that it started reporting again 40 minutes after I restarted it.
(In reply to Greg Arndt [:garndt] from comment #1) > I'm not sure if it's an issue with me receiving emails or dead man snitch > sending them but I didn't receive the email that the provisioner failed to > check in, but I did receive an email that it started reporting again 40 > minutes after I restarted it. I reliably get them, and I can't see much different between the two emails. I've sent you a copy of the raw email so you can see if something is hitting filters you have.
Flags: needinfo?(jhford)
I restarted the provisioner again this morning due to a ~4h outage. I also wasn't able to capture logs I'm afraid.
I have typically gotten them, and was receiving them last night during the downtime so they're not being filtered. It must have been a hiccup yesterday where there was a delay in getting the emails from them.
That is strange, they were arriving every 10 minutes, and didn't all arrive at once. Is it possible there were two outages?
This issue was fixed by rewriting the aws-manager.js file, switching to node 4, statically transpiling and removing the multi-region-aws-sdk library
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Component: AWS-Provisioner → Services
You need to log in before you can comment on or make changes to this bug.