Closed Bug 964320 Opened 11 years ago Closed 11 years ago

Migrate reps.mozilla.org to the generic cluster

Categories

(Infrastructure & Operations Graveyard :: WebOps: Engagement, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: jd, Assigned: bburton)

References

Details

(Keywords: spring-cleaning, Whiteboard: [MIGRATION: 2014-03-04 05:30AM PST])

This is a tracker bug for moving reps.m.o from the engagement cluster to the generic cluster. This work is being done as the engagement cluster is no longer highly utilized and the hardware is end-of-life. I have CCd a few folks who may be able to work with Webops on this move. The code and configuration will be trivial to move over. There should be no new net-flows required. The point for discussion will need to be around the database. There are three options. The simplest option would be to take some downtime. This is the usual, take site down, migrate database, bring site up in new location. There next option we can explore if the app can be placed in read-only mode. This is the same as the usual method but the site stays up in the old location read-only instead of being hard down. So set site read-only, migrate database, switch site to new location read-write. The final option also requires the site to be able to be placed in read-only mode. This option requires a bit more coordination and scheduling with the DBAs. First we set up a slave database in the new location which replicates from the current master. Then we place the site in the old location in read-only while simultaneously switching the site over to the new location. If all goes well there will be little appreciable read-only time for the end users. This option actually requires a bunch of additional background database steps and so cannot guarantee zero read-only or zero downtime, but is as close as things can get. My preference would be for the second option as it has no hard downtime and is quite safe in terms of data integrity. I have two questions that I need answered to move forward with this project. First is if the site can be placed in read-only mode? Second is when a good time would be to schedule this work? I am asking more day-of-week and time-of-day that the site has lowest traffic or is least critical. Keeping in mind of course that all of these options are set in an ideal world of dreams and cold hard reality might force some downtime during the move. FWIW this move will also fix a number of the issues we have seen recently revolving around poor performance due to the seamicro atoms. I have blocked the most recent bug to that end as appropriate. Thanks in advance for your help with this.
Blocks: 964338
Blocks: 869213
Sundays are the least busy days for reps.m.o according to google analytics and other days are sharing more or less the same traffic. I believe that the site can be set to read-only mode, given that we inform reps a few days early to plan accordingly. Do you have an estimation of the time required to complete the migration?
Given some advance time to prepare for the migration, the actual cut over should be on the order of 30 minutes. The downtime or cut-over time will be basically zero if the site is in read-only mode. There may be some small oddities when the IP is cut over, but these should be less than 30 seconds and will not effect everyone. The 30 minutes then, represents the read-only time.
Blocks: 974155
No longer blocks: 974155
Depends on: 974155
Now that I've completed the dev|stage migrations, we're ready to proceed with the production migration We'd like to schedule this migration for next week, our two options for scheduling are 1. Tue, Wed, or Thu (3/4,5,6) at 5:30AM PST 2. Sat, 3/8 @ 7AM PST Depending on when a developer is available to perform testing and QA post migration The migration will take 60 minutes, during which the site will be redirected to hardhat.mozilla.net, the plan of action is roughly * rsync code and uploaded files to generic cluster * I'll have a copy of production finished tomorrow, so the sync up will be quick * dump database, copy, and import into generic cluster * put new settings files in place and test push with chief * do testing via /etc/hosts * cutover DNS Rollback plan is * Fail DNS back to existing cluster Please let me know which of the two time options will work for developer availability so that we can schedule this Thanks
Assignee: server-ops-webops → bburton
Flags: needinfo?(jgiannelos)
Flags: needinfo?(giorgos)
Hi :solarce The first time option works for me. I don't have any preference on day. Thanks.
Flags: needinfo?(jgiannelos)
Same here. Let's agree on Tue 4 Martch 5:30AM PST?
Flags: needinfo?(giorgos)
(In reply to Giorgos Logiotatidis [:giorgos] from comment #5) > Same here. Let's agree on Tue 4 Martch 5:30AM PST? Excellent, we're confirmed for 3/4 @ 5:30AM PST I'll join #remo-dev before I start and coordinate testing and such in there
Whiteboard: [MIGRATION: 2014-03-04 05:30AM PST]
Blocks: 978436
No longer blocks: 978436
reps.mozilla.org 95% complete: ------------------------------ * puppet updates - apache config - crontab - manifest bits for weblogs * copied /src/reps.mozilla.org directory to genericadm * dumped and moved the database to generic1.db.phx1 * NFS for user uploads migrated and content copied * rabbitmq config pushed with puppet and django config updated * chief config copied * chief push works: http://genericadm.private.phx1.mozilla.com/chief/reps.prod/logs/7bbf7540c343d4121d125396fcf36e54509d7aeb.1393855962 * celeryd manifests copied and deployed with puppet * local.py updated with db, memcache, celery configs * commander_settings.py updated and confirmed working with chief push above * cronjobs are running as expected * deploy worked as noted above ---------------- You can test out the site by adding the following to your /etc/hosts[1][2] file 63.245.217.86 reps.mozilla.org Let me know if you have any questions, but everything looks good for the push tomorrow [1] http://osxdaily.com/2012/08/07/edit-hosts-file-mac-os-x/ [2] http://helpdeskgeek.com/windows-7/windows-7-hosts-file/
I visited a couple of pages in the new host, logged in, edited and saved and event. Looks OK. Thanks for the update :solarce.
Site has been migrated to generic cluster. * https://mana.mozilla.org/wiki/pages/viewpage.action?pageId=1082416 has been updated with new information * https://mana.mozilla.org/wiki/pages/viewpage.action?pageId=1082416#reps.mozilla.org%28ReMo%29-Update/Pushprocedure has the new Chief URLs * Confirmed a push through Chief works and pushbot has correct URLs Thanks everyone for a smooth migration.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Verified looks good on prod.
Status: RESOLVED → VERIFIED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.