Closed Bug 1152987 Opened 10 years ago Closed 10 years ago

DEPLOY updated Onyx App CFN stacks

Categories

(Content Services Graveyard :: Tiles: Ops, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mostlygeek, Assigned: relud)

References

Details

This is a two in one deploy as it makes more sense this way due to the nature of the changes. #1. The Onyx ELB CFN has been updated from the jenkins generated template to a static+parameters template. No other major changes were implemented. #2. The Onyx Application stack was updated with these changes: - added datadog agent to app servers - changed onyx statsd to point at datadog agent on 127.0.0.1 instead of the shared graphite stack This is the deployment plan: - create the new ELBs as a new Cloudformation stack - create the new app servers as a new cloudformation stack - deployed with a fixed 11 servers ASG size to prevent autoscaling down - using R53 load balancing: - split traffic 90 (old) / 10% (new) - verify data is flowing out normally (s3,sns) to processor - verify statsd is flowing to datadog - split traffic 75 / 25, wait 10min - split traffic 50 / 50, wait 10min - split traffic 25 / 75, wait 10min - split traffic 0 / 100, wait 10min - scale down the old cluster to zero onyx servers
Assignee: nobody → dthornton
Blocks: 1152993
No longer blocks: 1152993
Depends on: 1152993
No longer depends on: 1152993
Blocks: 1153018
we aren't set up for dns-style migrations of onyx right now. the current method is: deploy an ASG of size 1 behind the live ELB as a canary (from jenkins) check that it's healthy (manually) update that stack to have size 3-42 (from jenkins) wait for scaling to complete (manually) reduce the previous stack size gradually to 0 wait 7 days destroy old stack
Notes from IRC discussion: Change deployment plan to be: - deploy an ASG of size 1 behind the live ELB as a canary (via jenkins) - check that it's healthy (manually), and sending statsd to datadog - update new stack to have 11 onyx minimum (via jenkins) - wait for scaling to complete, 11 healthy servers in ELB from new ASG - reduce the previous stack size to 0 (manual) - wait until all old onyx instances are out of the ELB - update new stack to have ASG minSize of 3 (via Jenkins) - will allow auto-scaling down again - wait at least 3 days, destroy the old onyx stack (manually) - create a bug for this
Summary: DEPLOY updated Onyx ELB and Onyx App CFN stacks → DEPLOY updated Onyx App CFN stacks
This deployment is scheduled for 2PM today.
canary deployed successfully onyx->redshift test started at 13:56:35 PDT and completed successfully at 14:11:18 PDT scaling up to 11 hosts
scale up worked well, old stack scaled down to 0 hosts, min hosts in new stack reduced to 3
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.