Closed Bug 1253367 Opened 8 years ago Closed 8 years ago

Review and create a migration plan for aus-admin

Categories

(Release Engineering Graveyard :: Applications: Balrog (backend), defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mostlygeek, Unassigned)

References

Details

Balrog's administration interface is currently at aus4-admin.mozilla.org, which cname to aus4-admin-external.zlb.scl3.mozilla.com and resolves to an internal IP address in SCL3.

For migration we will likely choose a new DNS endpoint that resolves to a public IP address. This public IP address will be firewalled to only a specific set of the outgoing NAT instances in SCL3. It will also require LDAP authentication to sign in.
A few things to figure out: 

- Where is aus4-admin.m.o DNS name baked into right now? 
- Is it easier to update aus4-admin.m.o to CNAME to a new balrog-admin.m.o?
- for posterity, would it be better to just migrate everything to balrog-admin.m.o?
- all servers will just go to the new host in CloudOps AWS
- is balrog-admin.mozilla.org an acceptable name? Perhaps aus-admin.mozilla.org is better
(In reply to Benson Wong [:mostlygeek] from comment #1)
> A few things to figure out: 
> 
> - Where is aus4-admin.m.o DNS name baked into right now? 
> - Is it easier to update aus4-admin.m.o to CNAME to a new balrog-admin.m.o?
> - for posterity, would it be better to just migrate everything to
> balrog-admin.m.o?
> - all servers will just go to the new host in CloudOps AWS
> - is balrog-admin.mozilla.org an acceptable name? Perhaps
> aus-admin.mozilla.org is better

I'm not tied to "aus4-admin" as a name, in fact "aus-admin" would probably be better. I'd avoid "balrog-admin", because that's more of a code name. The CNAME sounds like a good idea, because then we can update the client's to use the new name at convenient times, rather than trying to change them all immediately.
How does this sound for a migration plan:

- point aus-admin.mozilla.org at Balrog in AWS
  - TLS cert will have an ALT NAME for aus4-admin.mozilla.org
- disable HTTP services on aus4-admin-external.zlb.scl3.mozilla.com, so DB gets no new writes
- migrate database to AWS
- test to make sure it works
- update aus4-admin.m.o == CNAME ==> aus-admin.m.o (60sec ttl)
- test to make sure aus4-admin.m.o works
- test to make sure aus-admin.m.o works

Rollback plan: 

- revert aus4-admin.m.o DNS back to aus4-admin-external.zlb.scl3.mozilla.com (if changed)
- enable HTTP services on aus4-admin-external.zlb.scl3.mozilla.com 
- trouble shoot, repeat migration plan
Flags: needinfo?(jvehent)
Flags: needinfo?(bhearsum)
Summary: Review and create a migration plan for balrog-admin → Review and create a migration plan for aus-admin
(In reply to Benson Wong [:mostlygeek] from comment #3)
> How does this sound for a migration plan:
> 
> - point aus-admin.mozilla.org at Balrog in AWS
>   - TLS cert will have an ALT NAME for aus4-admin.mozilla.org
> - disable HTTP services on aus4-admin-external.zlb.scl3.mozilla.com, so DB
> gets no new writes
> - migrate database to AWS

Is it possible to do an initial migration a day or so ahead of time, and then do an incremental copy after we disable writes? The database is 175GB or so, but doesn't change a ton over the course of the day, so this would greatly reduce the amount of time that we need to disable writes for. If we can get this time down low enough we can probably do it without a tree closure window.

> - test to make sure it works
> - update aus4-admin.m.o == CNAME ==> aus-admin.m.o (60sec ttl)
> - test to make sure aus4-admin.m.o works
> - test to make sure aus-admin.m.o works

How do aus4/aus5.mozilla.org figure into this? Will they be migrating at the same time, ahead of time, or before? They could survive a short period of time being out of sync if necessary.
Flags: needinfo?(bhearsum) → needinfo?(bwong)
> - disable HTTP services on aus4-admin-external.zlb.scl3.mozilla.com, so DB gets no new writes

It might be worth asking webops if the ZLB could be pointed at the new aus-admin in AWS, for a day or two before being shut off, for clients that cache DNS.

Otherwise, r+ on the plan.
Flags: needinfo?(jvehent)
> Is it possible to do an initial migration a day or so ahead of time, and then do an incremental copy after we disable writes?

Yes. AWS has a recommended process [1] to migrate with reduce downtime.

[1] http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.Procedural.Importing.NonRDSRepl.html
Flags: needinfo?(bwong)
> It might be worth asking webops if the ZLB could be pointed at the new aus-admin in AWS, for a day or two before being shut off, for clients that cache DNS.

I'm not concerned with clients that do not respect DNS TTL. Getting a TCP connect failure to the old servers should trigger a DNS refresh. If not, they can fail until DNS is properly updated.
I suspect we're done here :)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: Release Engineering → Release Engineering Graveyard
You need to log in before you can comment on or make changes to this bug.