Closed
Bug 1152987
Opened 10 years ago
Closed 10 years ago
DEPLOY updated Onyx App CFN stacks
Categories
(Content Services Graveyard :: Tiles: Ops, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mostlygeek, Assigned: relud)
References
Details
This is a two in one deploy as it makes more sense this way due to the nature of the changes.
#1. The Onyx ELB CFN has been updated from the jenkins generated template to a static+parameters template. No other major changes were implemented.
#2. The Onyx Application stack was updated with these changes:
- added datadog agent to app servers
- changed onyx statsd to point at datadog agent on 127.0.0.1 instead of the shared graphite stack
This is the deployment plan:
- create the new ELBs as a new Cloudformation stack
- create the new app servers as a new cloudformation stack
- deployed with a fixed 11 servers ASG size to prevent autoscaling down
- using R53 load balancing:
- split traffic 90 (old) / 10% (new)
- verify data is flowing out normally (s3,sns) to processor
- verify statsd is flowing to datadog
- split traffic 75 / 25, wait 10min
- split traffic 50 / 50, wait 10min
- split traffic 25 / 75, wait 10min
- split traffic 0 / 100, wait 10min
- scale down the old cluster to zero onyx servers
| Reporter | ||
Updated•10 years ago
|
Assignee: nobody → dthornton
| Reporter | ||
Updated•10 years ago
|
| Assignee | ||
Comment 1•10 years ago
|
||
we aren't set up for dns-style migrations of onyx right now. the current method is:
deploy an ASG of size 1 behind the live ELB as a canary (from jenkins)
check that it's healthy (manually)
update that stack to have size 3-42 (from jenkins)
wait for scaling to complete (manually)
reduce the previous stack size gradually to 0
wait 7 days
destroy old stack
| Reporter | ||
Comment 2•10 years ago
|
||
Notes from IRC discussion:
Change deployment plan to be:
- deploy an ASG of size 1 behind the live ELB as a canary (via jenkins)
- check that it's healthy (manually), and sending statsd to datadog
- update new stack to have 11 onyx minimum (via jenkins)
- wait for scaling to complete, 11 healthy servers in ELB from new ASG
- reduce the previous stack size to 0 (manual)
- wait until all old onyx instances are out of the ELB
- update new stack to have ASG minSize of 3 (via Jenkins)
- will allow auto-scaling down again
- wait at least 3 days, destroy the old onyx stack (manually)
- create a bug for this
| Reporter | ||
Updated•10 years ago
|
Summary: DEPLOY updated Onyx ELB and Onyx App CFN stacks → DEPLOY updated Onyx App CFN stacks
| Reporter | ||
Comment 3•10 years ago
|
||
This deployment is scheduled for 2PM today.
| Assignee | ||
Comment 4•10 years ago
|
||
canary deployed successfully
onyx->redshift test started at 13:56:35 PDT and completed successfully at 14:11:18 PDT
scaling up to 11 hosts
| Assignee | ||
Comment 5•10 years ago
|
||
scale up worked well, old stack scaled down to 0 hosts, min hosts in new stack reduced to 3
| Assignee | ||
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•