Closed Bug 1433020 Opened 7 years ago Closed 7 years ago

Cloud-mirror updates fail

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: jhford)

References

Details

We tried to deploy an updated cloud-mirror today, and it failed. About 40 minutes after the deployment, the service went down with increasing memory usage and H12's. Logs were full of Sentry tracebacks, and at the same time the auth service saw 100's of rps of 401's from cloud-mirror requesting Sentry tokens. There are also some unhandled Promise errors. It seems reasonable that some exception occurred, and reporting that to Sentry failed with an unhandled Promise rejection, leaving the HTTP connection pending, and memory consumption piled up in the form of open HTTP connections. Rolling back the service in Heroku seemed to fix things.
Per email, John has successfully deployed c-m. If everything seems OK, this can be closed!
Assignee: nobody → jhford
I have deployed the front end to Heroku and not seen any issues as well as the backend to docker cloud. Both deployments went smoothly and I did not see any issues. As a precaution, I have also disabled sentry reporting in this service and instead set up Heroku alerts to monitor for 400 and 500 errors which would be the primary signal of something going wrong. The only remaining issue here is to verify that the backend truly has deployed, since the Docker Cloud UI and CLI are basically so confusing and hard to use, it's nearly impossible to see what's going on!
I was able to log into a terminal on a running instance deployed to Docker Cloud and have verified that the config change has been deployed on docker cloud. Unless this blows up, we're safe to close.
Should we also get bugs on file to track enabling sentry and statsum for this service?
I've set up a heroku alert, which is a good enough signal for whether cloud-mirror is down. Having docker cloud in play makes things a lot more complicated than they'd otherwise be, and so not really worth the effort in my mind. Cloud-mirror deployments look like they're working again, so I'm going to mark this as RESOLVED->FIXED
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Component: Queue → Services
You need to log in before you can comment on or make changes to this bug.