Open Bug 1524270 Opened 7 years ago Updated 6 years ago

Race condition between deployment of Remote Settings and database migration

Categories

(Cloud Services :: Operations: Kinto, task)

task
Not set
normal

Tracking

(Not tracked)

People

(Reporter: autrilla, Assigned: sven)

Details

Our automated deployment pipeline for Remote Settings currently deploys a CloudFormation stack for the web servers. It then waits for the web servers to report that they're ready. At that point, it starts the database migration by connected to one of these servers via SSH and executing a shell script. However, more of these servers can be created (or deleted) at any moment due to them being part of an Auto-Scaling Group. Therefore, it is possible for our pipeline to pick a server that is not actually ready yet.

An quick fix could be just change https://github.com/mozilla-services/cloudops-deployment/blob/master/projects/kinto/ansible/templates/kinto-app.yml#L212 to the value in https://github.com/mozilla-services/cloudops-deployment/blob/master/projects/kinto/ansible/envs/prod.yml#L19

That way ASG won't remove instances from the new stack while it's waiting for DB migration, and waiting for being promoted.

This never happened before because there was never such a long pause (> 13min) between deploying the new stack and running the db migration script. Because of that, when I looked at it, the -205 stack already been reduced to 0 instance due to the long period time of no activity.

Assignee: autrilla → sven
You need to log in before you can comment on or make changes to this bug.