Closed Bug 481632 Opened 15 years ago Closed 15 years ago

Launch Spread Firefox site redesign

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

All
Other
task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: abuchanan, Assigned: aravind)

References

()

Details

(Whiteboard: launch date: 4/30/09)

Attachments

(2 files, 1 obsolete file)

Hey IT,

This is a "heads up" bug for launching the work done on the Spreadfirefox.com redesign.

Right now, the tentative date for launch is March 11th

This will most likely cause site downtime, either by locking out all visitors, or (hopefully) only blocking logged in users.  I don't have an estimate on how long the transition will take yet.

I'll post more throughout today and tomorrow as I solidify my plan, but here is the rough draft as it sits in my head right now...

Pre-game)  Come up with content (and a method for displaying it) to tell users why the site is about to be down and what to expect.

1) Check out the newly tagged for production redesign code in a new directory on the prod. cluster and give it a DNS path.  This way we can leave the old site running while I set up changes on the new sandboxed-prod version.

2)  Clear existing sessions and prevent new users from logging in so that no important DB writes are made, no content is changed.  

3)  Copy the production DB to a sandboxed-prod version and add this to the sandboxed-prod config.  The sandboxed-prod version should now be usable for setting things up without affecting the public site.

4)  I'll run through all the Drupal settings manually.  Drupal makes this part a PITA, so this could take a little while.

5)  We do a quick QA run through, make sure things look OK.

6)  Mary, Stephen (or a delegate of his choosing), and myself all explicitly sign off on throwing the switch in this bug.

6)  We flip the Apache/DNS config from the old directory to the new one, unlock user logins, and pray :)

Please, please let me know if anything sounds unclear and/or you have questions concerns.  This part is always tricky (especially with Drupal) so I'm trying to make it as clear and uneventful as possible.  

Also, if you have a suggestion on when is a good time (am/pm) to do this, I'd love it.

That's all I've got for now, Thanks!

Oh, almost forgot, note-to-self:  figure out how to lock-down user logins
Target is now March 17th!
Calendar says oremj is oncall that week.
Flags: needs-downtime+
Whiteboard: Tuesday, March 17
The launch has been pushed from Mar. 17th

I'll update with new target after our meeting today
Whiteboard: Tuesday, March 17
What's the new target date?
Hey there:  Ideally before the quarter is out, but need QA's input.
Whiteboard: waiting release date
New target release date is April 7th

A couple things have changed in the launch process...

1)  Instead of trying to lock out users from logging in, we've decided to take the safer/easier route and lock the whole site down.  IT can do this by redirecting spreadfirefox.com to a static site maintenance message.  Mary is working on this content

2)  Instead of trying to do the changes by hand (silly me) I'll have IT run an updates.sql file which should take care of most(all?) of the work
Whiteboard: waiting release date
Whiteboard: 04/07/2009
Sounds like IT/Webdev have a good handle on the launch transition (from comment #0), but just want to make sure we have a backout plan if things end up busted.

I think we have enough visitors to SFX to ensure a quick recovery if anything goes wrong.  I'm assuming the production DB is going to be backed up and that the code in SVN will be tagged before we check in the new code for launch (which is what I'm guessing the sandboxed_prod stuff is for).

Just wanted to throw that in there... I'm sure you fellas have this all covered. ;-)
Hi all:  To confirm, we'd like to start the migration at 7:00 p.m. PDT on April 7th during non-peak hours.  Thanks!
Can you guys please recap for me what needs to happen on the server side to do this?

From the first comment (0) and the link in comment #9, I know I have to do this for sure.

1) Put up a site down (maintenance) page.

2) Backup the db

3) update code

.....

Remove maintenance page.

The intermediate steps there are a blur to me.  Comment 0 seems to indicate that we need a whole new db, new site etc.  Is all that still needed?

If it is, why can't all that be done/staged beforehand?
sorry, let me clarify...

1)  Shut down the site, by redirecting all spreadfirefox.com traffic to the outage page in bug 487148.  We need an IP or URL where we can access the site for any manual set up and QA that needs to be done before switching the site back to public

2) back up the DB

3) update the code to tags/production@HEAD

4) We'll need a new, separate database to hold affiliates stats.  This will require creating it, uploading the schema (I will attach here shortly), and adding the config to Drupal's settings.php

  * you should be able to merge in the config changes from settings.php.dist.  It's only like 3 lines, it adds the mysql URI for the new DB

5) Run updates.sql (I'm still working on this)

6) At this point IT is on hold while we do the final QA round.

7) When Mary, Stephen or Krupa (QA), and myself all sign-off here, then IT can switch spreadfirefox.com DNS from the outage page, back to the site

I think that clears up everything, did I miss anything?
What's the downtime time estimate?
we're shooting for 1 hour
Hey there:  We're running about an hour behind.  Aravid:  We'll ping you as soon as we are ready to take the site down.

Is the redirect page set up and ready to go?

Thanks!
For the record, things came up that forced us to push back the launch.  Our new date is Thursday, April 9th (afaik)
Whiteboard: 04/07/2009 → 04/09/2009
Note to self:  Get recaptcha keys for spreadfirefox.com before launch
Assignee: server-ops → aravind
Attached file affiliates archive DB schema (obsolete) —
We're not launching tonight, because of affiliates issues.  We will regroup on Tuesday with a new date.

sorry for any confusion.
Whiteboard: 04/09/2009 → tbd
Going back to the general queue for now.
Assignee: aravind → server-ops
Please re-open the bug when you have a firm ETA on when this will go live.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → INCOMPLETE
Hey,

Reopening as we're ready to launch during Thursday's launch window (4/30/09)

I've simplified the launch process a bit.  For one, there will be no separate affiliates DB anymore (this was confusing before).  I've also tried to compile everything into one update script.

I'll post more details soon, I just wanted to get this on the radar.

Also, part of this will be making up 15 days of Affiliates points by parsing traffic logs for moz.com and download.m.o.  How long would you estimate this will take?  10min? an hour?
Severity: minor → major
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Whiteboard: tbd → launch date: 4/30/09
Also, I was told yesterday that the NS/Caching creates problems with server-side user agent detection.  

This launch will push code that includes server-side user agent detection with PHP.  We use this to detect whether a user has Firefox < 3, or a non-FF broswer, so that we know when to show a download button.

One solution mentioned is that the NS can be configured to work around this by caching based on URL + user agent.  The other is to use javascript to do the detection.

Since we don't have time to rewrite the detection code in JS, I'd like to use the NS config.  work-around for now.  IT, do you know how to do this?  Anything you need from me?

Also, the stage sites are behind the NS right?  No one had this problem on stage.  Any idea why?

Thanks
Hi IT:  We're ready to launch tomorrow night at 7 p.m.  Can someone get back to Alex re. the NS config. work-around question?  Thanks!
> Also, the stage sites are behind the NS right?  No one had this problem on
> stage.  Any idea why?

Who told you the production Netscaler would cause problems?  Stage sits behind the same pair of Netscalers as production does.
Ryan mentioned it caused problems with UA detection on AMO before.  Maybe he meant only Zues?
(In reply to comment #25)
> Ryan mentioned it caused problems with UA detection on AMO before.  Maybe he
> meant only Zues?

I was talking about the Netscalers. If stage is working and uses the same configuration and Netscalers as production, then it should be fine.
AMO behind the Netscaler has a lot of custom cache/no-cache rules.  sfx doesn't have those but like Ryan said, if it's working on stage it'll work in production.
sweet, thanks guys
It should be fine, then -- I've been testing on staging, and verified the download-button bug without seeing any issues.
IT:  Can we get an answer to comment #21...

"Also, part of this will be making up 15 days of Affiliates points by parsing
traffic logs for moz.com and download.m.o.  How long would you estimate this
will take?  10min? an hour?"

Thanks!
I'm not really sure how to answer, because it really all depends on how fast the log processing script is.  There are about 100G of data per day for mozilla.com and about 30G a day for download.mozilla.org, so ~1.5 terabytes of mozilla.com and ~500G of download.mozilla.org data.  The stats scripts I have seen have all been CPU bound or database bound.
download.mozilla.org logs are ~100,000,000 lines a day.  I'll run a line count on mozilla.com logs as well.
Assignee: server-ops → aravind
Status: REOPENED → ASSIGNED
ok, launch process is...

Log parsing...

1)  there is a cron job running, that parses moz.com and dmo logs, called run.pl.  This cron job needs to be turned off for now.
2)  pull the newest version of http://svn.mozilla.org/projects/spreadfirefox.com/logs-parse/affiliates-download-counting/
3)  configure this script with config.pl-dist, and run with `perl run.pl > out.sql`

the config should be self-explanatory, please ask me if you're confused.  days should be 15 (we are parsing 15 days worth of logs)


Drupal launch...

1)  redirect spreadfirefox.com to the outage page, and give us the info for back-door access the site
2)  backup the DB
3)  svn up
4)  run manual-prereqs.sql on the DB
5)  I'll take some time to change a few things manually
6)  updates.php needs to be run, I can do this
7)  When I give you the word, run cron.php
8)  When I give you the word, run updates.sql on the DB
9)  mary and I will manually update content, etc
10)  when we give the word, flip spreadfirefox.com back to the actual site
Attached file updates.sql
for step #8
Attachment #371896 - Attachment is obsolete: true
Attached file manual-prereqs.sql
for step #4
Done.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: