Race condition between deployment of Remote Settings and database migration
Categories
(Cloud Services :: Operations: Kinto, task)
Tracking
(Not tracked)
People
(Reporter: autrilla, Assigned: sven)
Details
Our automated deployment pipeline for Remote Settings currently deploys a CloudFormation stack for the web servers. It then waits for the web servers to report that they're ready. At that point, it starts the database migration by connected to one of these servers via SSH and executing a shell script. However, more of these servers can be created (or deleted) at any moment due to them being part of an Auto-Scaling Group. Therefore, it is possible for our pipeline to pick a server that is not actually ready yet.
An quick fix could be just change https://github.com/mozilla-services/cloudops-deployment/blob/master/projects/kinto/ansible/templates/kinto-app.yml#L212 to the value in https://github.com/mozilla-services/cloudops-deployment/blob/master/projects/kinto/ansible/envs/prod.yml#L19
That way ASG won't remove instances from the new stack while it's waiting for DB migration, and waiting for being promoted.
This never happened before because there was never such a long pause (> 13min) between deploying the new stack and running the db migration script. Because of that, when I looked at it, the -205 stack already been reduced to 0 instance due to the long period time of no activity.
Updated•6 years ago
|
Description
•