Last Comment Bug 481632 - Launch Spread Firefox site redesign
: Launch Spread Firefox site redesign
launch date: 4/30/09
Product: Infrastructure & Operations
Classification: Other
Component: WebOps: Other (show other bugs)
: other
: All Other
-- major (vote)
: ---
Assigned To: Aravind Gottipati [:aravind]
: matthew zeier [:mrz]
Depends on:
Blocks: 487148
  Show dependency treegraph
Reported: 2009-03-05 05:41 PST by Alex Buchanan [:abuchanan]
Modified: 2013-10-09 10:29 PDT (History)
9 users (show)
justdave: needs‑downtime+
See Also:
Due Date:
QA Whiteboard:
Iteration: ---
Points: ---
Cab Review: ServiceNow Change Request (use flag)

affiliates archive DB schema (2.63 KB, text/plain)
2009-04-09 11:18 PDT, Alex Buchanan [:abuchanan]
no flags Details
updates.sql (131.25 KB, text/plain)
2009-04-30 19:47 PDT, Alex Buchanan [:abuchanan]
no flags Details
manual-prereqs.sql (394 bytes, text/plain)
2009-04-30 19:47 PDT, Alex Buchanan [:abuchanan]
no flags Details

Description User image Alex Buchanan [:abuchanan] 2009-03-05 05:41:01 PST
Hey IT,

This is a "heads up" bug for launching the work done on the redesign.

Right now, the tentative date for launch is March 11th

This will most likely cause site downtime, either by locking out all visitors, or (hopefully) only blocking logged in users.  I don't have an estimate on how long the transition will take yet.

I'll post more throughout today and tomorrow as I solidify my plan, but here is the rough draft as it sits in my head right now...

Pre-game)  Come up with content (and a method for displaying it) to tell users why the site is about to be down and what to expect.

1) Check out the newly tagged for production redesign code in a new directory on the prod. cluster and give it a DNS path.  This way we can leave the old site running while I set up changes on the new sandboxed-prod version.

2)  Clear existing sessions and prevent new users from logging in so that no important DB writes are made, no content is changed.  

3)  Copy the production DB to a sandboxed-prod version and add this to the sandboxed-prod config.  The sandboxed-prod version should now be usable for setting things up without affecting the public site.

4)  I'll run through all the Drupal settings manually.  Drupal makes this part a PITA, so this could take a little while.

5)  We do a quick QA run through, make sure things look OK.

6)  Mary, Stephen (or a delegate of his choosing), and myself all explicitly sign off on throwing the switch in this bug.

6)  We flip the Apache/DNS config from the old directory to the new one, unlock user logins, and pray :)

Please, please let me know if anything sounds unclear and/or you have questions concerns.  This part is always tricky (especially with Drupal) so I'm trying to make it as clear and uneventful as possible.  

Also, if you have a suggestion on when is a good time (am/pm) to do this, I'd love it.

That's all I've got for now, Thanks!

Oh, almost forgot, note-to-self:  figure out how to lock-down user logins
Comment 1 User image Mary 2009-03-09 22:03:14 PDT
Target is now March 17th!
Comment 2 User image Dave Miller [:justdave] ( 2009-03-09 23:17:14 PDT
Calendar says oremj is oncall that week.
Comment 3 User image Alex Buchanan [:abuchanan] 2009-03-17 10:18:28 PDT
The launch has been pushed from Mar. 17th

I'll update with new target after our meeting today
Comment 4 User image Jeremy Orem [:oremj] 2009-03-18 00:04:55 PDT
What's the new target date?
Comment 5 User image Mary 2009-03-18 16:45:07 PDT
Hey there:  Ideally before the quarter is out, but need QA's input.
Comment 6 User image Alex Buchanan [:abuchanan] 2009-03-26 10:24:09 PDT
New target release date is April 7th

A couple things have changed in the launch process...

1)  Instead of trying to lock out users from logging in, we've decided to take the safer/easier route and lock the whole site down.  IT can do this by redirecting to a static site maintenance message.  Mary is working on this content

2)  Instead of trying to do the changes by hand (silly me) I'll have IT run an updates.sql file which should take care of most(all?) of the work
Comment 7 User image Jay Patel [:jay] 2009-03-31 17:26:49 PDT
Sounds like IT/Webdev have a good handle on the launch transition (from comment #0), but just want to make sure we have a backout plan if things end up busted.

I think we have enough visitors to SFX to ensure a quick recovery if anything goes wrong.  I'm assuming the production DB is going to be backed up and that the code in SVN will be tagged before we check in the new code for launch (which is what I'm guessing the sandboxed_prod stuff is for).

Just wanted to throw that in there... I'm sure you fellas have this all covered. ;-)
Comment 8 User image Mary 2009-04-06 16:00:01 PDT
Hi all:  To confirm, we'd like to start the migration at 7:00 p.m. PDT on April 7th during non-peak hours.  Thanks!
Comment 9 User image Mary 2009-04-06 16:01:45 PDT
Also, recap of launch steps here:
Comment 10 User image Aravind Gottipati [:aravind] 2009-04-07 12:17:03 PDT
Can you guys please recap for me what needs to happen on the server side to do this?

From the first comment (0) and the link in comment #9, I know I have to do this for sure.

1) Put up a site down (maintenance) page.

2) Backup the db

3) update code


Remove maintenance page.

The intermediate steps there are a blur to me.  Comment 0 seems to indicate that we need a whole new db, new site etc.  Is all that still needed?

If it is, why can't all that be done/staged beforehand?
Comment 11 User image Alex Buchanan [:abuchanan] 2009-04-07 13:25:17 PDT
sorry, let me clarify...

1)  Shut down the site, by redirecting all traffic to the outage page in bug 487148.  We need an IP or URL where we can access the site for any manual set up and QA that needs to be done before switching the site back to public

2) back up the DB

3) update the code to tags/production@HEAD

4) We'll need a new, separate database to hold affiliates stats.  This will require creating it, uploading the schema (I will attach here shortly), and adding the config to Drupal's settings.php

  * you should be able to merge in the config changes from settings.php.dist.  It's only like 3 lines, it adds the mysql URI for the new DB

5) Run updates.sql (I'm still working on this)

6) At this point IT is on hold while we do the final QA round.

7) When Mary, Stephen or Krupa (QA), and myself all sign-off here, then IT can switch DNS from the outage page, back to the site

I think that clears up everything, did I miss anything?
Comment 12 User image matthew zeier [:mrz] 2009-04-07 15:31:56 PDT
What's the downtime time estimate?
Comment 13 User image Alex Buchanan [:abuchanan] 2009-04-07 15:36:57 PDT
we're shooting for 1 hour
Comment 14 User image Mary 2009-04-07 18:48:28 PDT
Hey there:  We're running about an hour behind.  Aravid:  We'll ping you as soon as we are ready to take the site down.

Is the redirect page set up and ready to go?

Comment 15 User image Alex Buchanan [:abuchanan] 2009-04-08 09:18:11 PDT
For the record, things came up that forced us to push back the launch.  Our new date is Thursday, April 9th (afaik)
Comment 16 User image Alex Buchanan [:abuchanan] 2009-04-08 10:32:20 PDT
Note to self:  Get recaptcha keys for before launch
Comment 17 User image Alex Buchanan [:abuchanan] 2009-04-09 11:18:01 PDT
Created attachment 371896 [details]
affiliates archive DB schema
Comment 18 User image Alex Buchanan [:abuchanan] 2009-04-09 18:21:13 PDT
We're not launching tonight, because of affiliates issues.  We will regroup on Tuesday with a new date.

sorry for any confusion.
Comment 19 User image Aravind Gottipati [:aravind] 2009-04-09 19:11:03 PDT
Going back to the general queue for now.
Comment 20 User image Aravind Gottipati [:aravind] 2009-04-15 14:24:46 PDT
Please re-open the bug when you have a firm ETA on when this will go live.
Comment 21 User image Alex Buchanan [:abuchanan] 2009-04-28 15:47:32 PDT

Reopening as we're ready to launch during Thursday's launch window (4/30/09)

I've simplified the launch process a bit.  For one, there will be no separate affiliates DB anymore (this was confusing before).  I've also tried to compile everything into one update script.

I'll post more details soon, I just wanted to get this on the radar.

Also, part of this will be making up 15 days of Affiliates points by parsing traffic logs for and download.m.o.  How long would you estimate this will take?  10min? an hour?
Comment 22 User image Alex Buchanan [:abuchanan] 2009-04-29 13:27:19 PDT
Also, I was told yesterday that the NS/Caching creates problems with server-side user agent detection.  

This launch will push code that includes server-side user agent detection with PHP.  We use this to detect whether a user has Firefox < 3, or a non-FF broswer, so that we know when to show a download button.

One solution mentioned is that the NS can be configured to work around this by caching based on URL + user agent.  The other is to use javascript to do the detection.

Since we don't have time to rewrite the detection code in JS, I'd like to use the NS config.  work-around for now.  IT, do you know how to do this?  Anything you need from me?

Also, the stage sites are behind the NS right?  No one had this problem on stage.  Any idea why?

Comment 23 User image Mary 2009-04-29 22:56:18 PDT
Hi IT:  We're ready to launch tomorrow night at 7 p.m.  Can someone get back to Alex re. the NS config. work-around question?  Thanks!
Comment 24 User image matthew zeier [:mrz] 2009-04-30 12:53:33 PDT
> Also, the stage sites are behind the NS right?  No one had this problem on
> stage.  Any idea why?

Who told you the production Netscaler would cause problems?  Stage sits behind the same pair of Netscalers as production does.
Comment 25 User image Alex Buchanan [:abuchanan] 2009-04-30 12:58:20 PDT
Ryan mentioned it caused problems with UA detection on AMO before.  Maybe he meant only Zues?
Comment 26 User image Ryan Doherty (:rdoherty) 2009-04-30 13:02:43 PDT
(In reply to comment #25)
> Ryan mentioned it caused problems with UA detection on AMO before.  Maybe he
> meant only Zues?

I was talking about the Netscalers. If stage is working and uses the same configuration and Netscalers as production, then it should be fine.
Comment 27 User image matthew zeier [:mrz] 2009-04-30 13:04:31 PDT
AMO behind the Netscaler has a lot of custom cache/no-cache rules.  sfx doesn't have those but like Ryan said, if it's working on stage it'll work in production.
Comment 28 User image Alex Buchanan [:abuchanan] 2009-04-30 13:05:15 PDT
sweet, thanks guys
Comment 29 User image Stephen Donner [:stephend] 2009-04-30 13:06:25 PDT
It should be fine, then -- I've been testing on staging, and verified the download-button bug without seeing any issues.
Comment 30 User image Mary 2009-04-30 14:58:00 PDT
IT:  Can we get an answer to comment #21...

"Also, part of this will be making up 15 days of Affiliates points by parsing
traffic logs for and download.m.o.  How long would you estimate this
will take?  10min? an hour?"

Comment 31 User image Jeremy Orem [:oremj] 2009-04-30 17:07:39 PDT
I'm not really sure how to answer, because it really all depends on how fast the log processing script is.  There are about 100G of data per day for and about 30G a day for, so ~1.5 terabytes of and ~500G of data.  The stats scripts I have seen have all been CPU bound or database bound.
Comment 32 User image Jeremy Orem [:oremj] 2009-04-30 17:13:00 PDT logs are ~100,000,000 lines a day.  I'll run a line count on logs as well.
Comment 33 User image Alex Buchanan [:abuchanan] 2009-04-30 19:46:29 PDT
ok, launch process is...

Log parsing...

1)  there is a cron job running, that parses and dmo logs, called  This cron job needs to be turned off for now.
2)  pull the newest version of
3)  configure this script with, and run with `perl > out.sql`

the config should be self-explanatory, please ask me if you're confused.  days should be 15 (we are parsing 15 days worth of logs)

Drupal launch...

1)  redirect to the outage page, and give us the info for back-door access the site
2)  backup the DB
3)  svn up
4)  run manual-prereqs.sql on the DB
5)  I'll take some time to change a few things manually
6)  updates.php needs to be run, I can do this
7)  When I give you the word, run cron.php
8)  When I give you the word, run updates.sql on the DB
9)  mary and I will manually update content, etc
10)  when we give the word, flip back to the actual site
Comment 34 User image Alex Buchanan [:abuchanan] 2009-04-30 19:47:19 PDT
Created attachment 375283 [details]

for step #8
Comment 35 User image Alex Buchanan [:abuchanan] 2009-04-30 19:47:48 PDT
Created attachment 375284 [details]

for step #4
Comment 36 User image Aravind Gottipati [:aravind] 2009-04-30 22:05:13 PDT

Note You need to log in before you can comment on or make changes to this bug.