964325 - Migrate affiliates.mozilla.org to the generic cluster

Reporter

Description

•

11 years ago

This is a tracker bug for moving affiliates.m.o from the engagement cluster to the generic cluster. This work is being done as the engagement cluster is no longer highly utilized and the hardware is end-of-life. I have CCd a few folks who may be able to work with Webops on this move. The code and configuration will be trivial to move over. There should be no new net-flows required. The point for discussion will need to be around the database. There are three options. The simplest option would be to take some downtime. This is the usual, take site down, migrate database, bring site up in new location. There next option we can explore if the app can be placed in read-only mode. This is the same as the usual method but the site stays up in the old location read-only instead of being hard down. So set site read-only, migrate database, switch site to new location read-write. The final option also requires the site to be able to be placed in read-only mode. This option requires a bit more coordination and scheduling with the DBAs. First we set up a slave database in the new location which replicates from the current master. Then we place the site in the old location in read-only while simultaneously switching the site over to the new location. If all goes well there will be little appreciable read-only time for the end users. This option actually requires a bunch of additional background database steps and so cannot guarantee zero read-only or zero downtime, but is as close as things can get. My preference would be for the second option as it has no hard downtime and is quite safe in terms of data integrity. I have two questions that I need answered to move forward with this project. First is if the site can be placed in read-only mode? Second is when a good time would be to schedule this work? I am asking more day-of-week and time-of-day that the site has lowest traffic or is least critical. Keeping in mind of course that all of these options are set in an ideal world of dreams and cold hard reality might force some downtime during the move. Thanks in advance for your help with this.

Jason Crowe [:jd]

Reporter

Updated

•

11 years ago

Blocks: 964338

Osmose [:osmose, :mkelly]

Comment 1

•

11 years ago

- Can the site be placed in read-only mode? Not as currently implemented. There's a possibility to implement this, but we're in the middle of starting up a rewrite of the main part of the site, so I'm reluctant to write a fairly complex read-only feature that will be used once. In this case, my preference is downtime if it's acceptable to the product owner (Chelsea!) depending on how long we expect it to take. Do we have any estimates for how long downtime will be in the case of option 1? - When is a good time to schedule this work? According to Google Analytics, midnight Pacific seems to be our lowest-traffic time on the main website, but I don't have easy access to data about the other major part of the site: the referral links. Since they directly return 302s, we'd have to analyze the server logs for those urls (example: //affiliates.mozilla.org/link/banner/52190). Is it possible to get a dump of that data?

Jason Crowe [:jd]

Reporter

Comment 2

•

11 years ago

The database dump is 114M so it takes about 2 seconds to dump, say <30 seconds to transfer, <30 to load in new location. Then there is a required DNS change. If we lower the TTL to 30 seconds and flip the switch at the same time we start the database dump it can propagate concurrently. So with a test run and having all the commands staged in advance, I think we would see less than 5 minutes of downtime. For notices to end users I would say 15 minutes. That way we have a bit of leeway in case things go awry and need some troubleshooting. This all assumes that there are no code changes or the like to muddy the waters and as I said, a valid test in advance. This test is basically the site operational with a snapshot of the database on another IP. Then it can be vetted and QAd and then all the actual migration requires is a refresh of the database and a DNS switch.

Osmose [:osmose, :mkelly]

Comment 3

•

11 years ago

Chelsea: Thoughts on 15 minutes of downtime? I'm sure we can whip together a quick "Sorry, Affiliates is down right now!" message to show in it's place.

Flags: needinfo?(cnovak)

Osmose [:osmose, :mkelly]

Comment 4

•

11 years ago

FYI, with some stats Chelsea provided, I estimate around 400 clicks on Affiliate banners would be lost in a 15 minute downtime window, based on the number of clicks from the past week. Based on the average clicks per week over the past few months, the number goes a bit lower.

Chelsea Novak [:chelsea]

Updated

•

11 years ago

Flags: needinfo?(cnovak)

Brandon Burton [:solarce]

Assignee

Updated

•

11 years ago

Depends on: 974155

Brandon Burton [:solarce]

Assignee

Comment 5

•

11 years ago

Now that I've completed the dev|stage migrations, we're ready to proceed with the production migration We'd like to schedule this migration for next week, our two options for scheduling are 1. Tue, Wed, or Thu (3/4,5,6) at 5:30AM PST 2. Sat, 3/8 @ 7AM PST Depending on when a developer is available to perform testing and QA post migration The migration will take 60 minutes, during which the site will be redirected to hardhat.mozilla.net, the plan of action is roughly * rsync code and uploaded files to generic cluster * I'll have a copy of production finished tomorrow, so the sync up will be quick * dump database, copy, and import into generic cluster * put new settings files in place and test push with chief * do testing via /etc/hosts * cutover DNS Rollback plan is * Fail DNS back to existing cluster Please let me know which of the two time options will work for developer availability so that we can schedule this Thanks

Assignee: server-ops-webops → bburton

Flags: needinfo?(mkelly)

Osmose [:osmose, :mkelly]

Comment 6

•

11 years ago

Chelsea has spoken, and the migration shall happen on SATURDAY SATURDAY SATURDAY

Flags: needinfo?(mkelly)

Brandon Burton [:solarce]

Assignee

Comment 7

•

11 years ago

Great, I've sent a meeting invite for a reminder I'll join #affiliates and coordinate testing and such there

Whiteboard: [MIGRATION: 2014-03-08 10:00AM PST]

Brandon Burton [:solarce]

Assignee

Updated

•

11 years ago

Whiteboard: [MIGRATION: 2014-03-08 10:00AM PST] → [MIGRATION: 2014-03-08 07:00AM PST]

Brandon Burton [:solarce]

Assignee

Comment 8

•

11 years ago

affiliates.mozilla.org 95% complete: ------------------------------ * puppet updates - apache config - crontab - manifest bits for weblogs * copied /src/affiliates.mozilla.org directory to genericadm * dumped and moved the database to generic1.db.phx1 * NFS for user uploads migrated and content copied * rabbitmq config pushed with puppet and django config updated * chief config copied * chief push works: http://genericadm.private.phx1.mozilla.com/chief/affiliates.prod/logs/7d22304ada7ffdb893fe8305630ec11eb84cfab5.1393868816 * celeryd manifests copied and deployed with puppet * local.py updated with db, memcache, celery configs * commander_settings.py updated and confirmed working with chief push above * cronjobs are running as expected * deploy worked as noted above ---------------- You can test out the site by adding the following to your /etc/hosts[1][2] file 63.245.217.86 affiliates.mozilla.org Let me know if you have any questions, but everything looks good for the push tomorrow [1] http://osxdaily.com/2012/08/07/edit-hosts-file-mac-os-x/ [2] http://helpdeskgeek.com/windows-7/windows-7-hosts-file/

Osmose [:osmose, :mkelly]

Comment 9

•

11 years ago

(In reply to Brandon Burton [:solarce] from comment #8) > Let me know if you have any questions, but everything looks good for the > push tomorrow Tomorrow? Don't you mean Saturday?

Brandon Burton [:solarce]

Assignee

Comment 10

•

11 years ago

(In reply to Michael Kelly [:mkelly,:Osmose] from comment #9) > (In reply to Brandon Burton [:solarce] from comment #8) > > Let me know if you have any questions, but everything looks good for the > > push tomorrow > > Tomorrow? Don't you mean Saturday? This is not reps.mo?! IS THIS SPARRRTTTTAAAA??!?!?!?!?! (yes, saturday)

Brandon Burton [:solarce]

Assignee

Comment 11

•

11 years ago

Site migrated to generic * media files rsync'd to new NFS location * database copied to new db cluster * DNS updated * Mana updated: https://mana.mozilla.org/wiki/display/websites/affiliates.mozilla.org * leaderboard crons run Post-flight checks from solarce and Osmose are all green

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Osmose [:osmose, :mkelly]

Comment 12

•

11 years ago

Verified, mothership and Facebook app look good from my testing.

Status: RESOLVED → VERIFIED

Nobody; OK to take it and work on it

Updated

•

9 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

Bugzilla

Migrate affiliates.mozilla.org to the generic cluster

Categories

(Infrastructure & Operations Graveyard :: WebOps: Engagement, task)

Tracking

(Not tracked)

People

(Reporter: jd, Assigned: bburton)

References

(
URL
)

Details

(Keywords: spring-cleaning, Whiteboard: [MIGRATION: 2014-03-08 07:00AM PST])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Updated

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated