Note: There are a few cases of duplicates in user autocompletion which are being worked on.

Launch Spread Firefox site redesign

RESOLVED FIXED

Status

Infrastructure & Operations
WebOps: Other
--
major
RESOLVED FIXED
9 years ago
4 years ago

People

(Reporter: abuchanan, Assigned: aravind)

Tracking

other
All
Other
Bug Flags:
needs-downtime +

Details

(Whiteboard: launch date: 4/30/09, URL)

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

9 years ago
Hey IT,

This is a "heads up" bug for launching the work done on the Spreadfirefox.com redesign.

Right now, the tentative date for launch is March 11th

This will most likely cause site downtime, either by locking out all visitors, or (hopefully) only blocking logged in users.  I don't have an estimate on how long the transition will take yet.

I'll post more throughout today and tomorrow as I solidify my plan, but here is the rough draft as it sits in my head right now...

Pre-game)  Come up with content (and a method for displaying it) to tell users why the site is about to be down and what to expect.

1) Check out the newly tagged for production redesign code in a new directory on the prod. cluster and give it a DNS path.  This way we can leave the old site running while I set up changes on the new sandboxed-prod version.

2)  Clear existing sessions and prevent new users from logging in so that no important DB writes are made, no content is changed.  

3)  Copy the production DB to a sandboxed-prod version and add this to the sandboxed-prod config.  The sandboxed-prod version should now be usable for setting things up without affecting the public site.

4)  I'll run through all the Drupal settings manually.  Drupal makes this part a PITA, so this could take a little while.

5)  We do a quick QA run through, make sure things look OK.

6)  Mary, Stephen (or a delegate of his choosing), and myself all explicitly sign off on throwing the switch in this bug.

6)  We flip the Apache/DNS config from the old directory to the new one, unlock user logins, and pray :)

Please, please let me know if anything sounds unclear and/or you have questions concerns.  This part is always tricky (especially with Drupal) so I'm trying to make it as clear and uneventful as possible.  

Also, if you have a suggestion on when is a good time (am/pm) to do this, I'd love it.

That's all I've got for now, Thanks!

Oh, almost forgot, note-to-self:  figure out how to lock-down user logins

Comment 1

9 years ago
Target is now March 17th!
Calendar says oremj is oncall that week.
Flags: needs-downtime+
Whiteboard: Tuesday, March 17
(Reporter)

Comment 3

9 years ago
The launch has been pushed from Mar. 17th

I'll update with new target after our meeting today

Updated

9 years ago
Whiteboard: Tuesday, March 17

Comment 4

9 years ago
What's the new target date?

Comment 5

9 years ago
Hey there:  Ideally before the quarter is out, but need QA's input.

Updated

8 years ago
Whiteboard: waiting release date
(Reporter)

Comment 6

8 years ago
New target release date is April 7th

A couple things have changed in the launch process...

1)  Instead of trying to lock out users from logging in, we've decided to take the safer/easier route and lock the whole site down.  IT can do this by redirecting spreadfirefox.com to a static site maintenance message.  Mary is working on this content

2)  Instead of trying to do the changes by hand (silly me) I'll have IT run an updates.sql file which should take care of most(all?) of the work
Whiteboard: waiting release date

Updated

8 years ago
Whiteboard: 04/07/2009

Comment 7

8 years ago
Sounds like IT/Webdev have a good handle on the launch transition (from comment #0), but just want to make sure we have a backout plan if things end up busted.

I think we have enough visitors to SFX to ensure a quick recovery if anything goes wrong.  I'm assuming the production DB is going to be backed up and that the code in SVN will be tagged before we check in the new code for launch (which is what I'm guessing the sandboxed_prod stuff is for).

Just wanted to throw that in there... I'm sure you fellas have this all covered. ;-)

Comment 8

8 years ago
Hi all:  To confirm, we'd like to start the migration at 7:00 p.m. PDT on April 7th during non-peak hours.  Thanks!

Comment 9

8 years ago
Also, recap of launch steps here:  https://wiki.mozilla.org/Launch_procedure#Spread_Firefox_Redesign_Launch_Details

Updated

8 years ago
Blocks: 487148
(Assignee)

Comment 10

8 years ago
Can you guys please recap for me what needs to happen on the server side to do this?

From the first comment (0) and the link in comment #9, I know I have to do this for sure.

1) Put up a site down (maintenance) page.

2) Backup the db

3) update code

.....

Remove maintenance page.

The intermediate steps there are a blur to me.  Comment 0 seems to indicate that we need a whole new db, new site etc.  Is all that still needed?

If it is, why can't all that be done/staged beforehand?
(Reporter)

Comment 11

8 years ago
sorry, let me clarify...

1)  Shut down the site, by redirecting all spreadfirefox.com traffic to the outage page in bug 487148.  We need an IP or URL where we can access the site for any manual set up and QA that needs to be done before switching the site back to public

2) back up the DB

3) update the code to tags/production@HEAD

4) We'll need a new, separate database to hold affiliates stats.  This will require creating it, uploading the schema (I will attach here shortly), and adding the config to Drupal's settings.php

  * you should be able to merge in the config changes from settings.php.dist.  It's only like 3 lines, it adds the mysql URI for the new DB

5) Run updates.sql (I'm still working on this)

6) At this point IT is on hold while we do the final QA round.

7) When Mary, Stephen or Krupa (QA), and myself all sign-off here, then IT can switch spreadfirefox.com DNS from the outage page, back to the site

I think that clears up everything, did I miss anything?
What's the downtime time estimate?
(Reporter)

Comment 13

8 years ago
we're shooting for 1 hour

Comment 14

8 years ago
Hey there:  We're running about an hour behind.  Aravid:  We'll ping you as soon as we are ready to take the site down.

Is the redirect page set up and ready to go?

Thanks!
(Reporter)

Comment 15

8 years ago
For the record, things came up that forced us to push back the launch.  Our new date is Thursday, April 9th (afaik)
Whiteboard: 04/07/2009 → 04/09/2009
(Reporter)

Comment 16

8 years ago
Note to self:  Get recaptcha keys for spreadfirefox.com before launch
(Assignee)

Updated

8 years ago
Assignee: server-ops → aravind
(Reporter)

Comment 17

8 years ago
Created attachment 371896 [details]
affiliates archive DB schema
(Reporter)

Comment 18

8 years ago
We're not launching tonight, because of affiliates issues.  We will regroup on Tuesday with a new date.

sorry for any confusion.
Whiteboard: 04/09/2009 → tbd
(Assignee)

Comment 19

8 years ago
Going back to the general queue for now.
Assignee: aravind → server-ops
(Assignee)

Comment 20

8 years ago
Please re-open the bug when you have a firm ETA on when this will go live.
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → INCOMPLETE
(Reporter)

Comment 21

8 years ago
Hey,

Reopening as we're ready to launch during Thursday's launch window (4/30/09)

I've simplified the launch process a bit.  For one, there will be no separate affiliates DB anymore (this was confusing before).  I've also tried to compile everything into one update script.

I'll post more details soon, I just wanted to get this on the radar.

Also, part of this will be making up 15 days of Affiliates points by parsing traffic logs for moz.com and download.m.o.  How long would you estimate this will take?  10min? an hour?
Severity: minor → major
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Whiteboard: tbd → launch date: 4/30/09
(Reporter)

Comment 22

8 years ago
Also, I was told yesterday that the NS/Caching creates problems with server-side user agent detection.  

This launch will push code that includes server-side user agent detection with PHP.  We use this to detect whether a user has Firefox < 3, or a non-FF broswer, so that we know when to show a download button.

One solution mentioned is that the NS can be configured to work around this by caching based on URL + user agent.  The other is to use javascript to do the detection.

Since we don't have time to rewrite the detection code in JS, I'd like to use the NS config.  work-around for now.  IT, do you know how to do this?  Anything you need from me?

Also, the stage sites are behind the NS right?  No one had this problem on stage.  Any idea why?

Thanks

Comment 23

8 years ago
Hi IT:  We're ready to launch tomorrow night at 7 p.m.  Can someone get back to Alex re. the NS config. work-around question?  Thanks!
> Also, the stage sites are behind the NS right?  No one had this problem on
> stage.  Any idea why?

Who told you the production Netscaler would cause problems?  Stage sits behind the same pair of Netscalers as production does.
(Reporter)

Comment 25

8 years ago
Ryan mentioned it caused problems with UA detection on AMO before.  Maybe he meant only Zues?
(In reply to comment #25)
> Ryan mentioned it caused problems with UA detection on AMO before.  Maybe he
> meant only Zues?

I was talking about the Netscalers. If stage is working and uses the same configuration and Netscalers as production, then it should be fine.
AMO behind the Netscaler has a lot of custom cache/no-cache rules.  sfx doesn't have those but like Ryan said, if it's working on stage it'll work in production.
(Reporter)

Comment 28

8 years ago
sweet, thanks guys
It should be fine, then -- I've been testing on staging, and verified the download-button bug without seeing any issues.

Comment 30

8 years ago
IT:  Can we get an answer to comment #21...

"Also, part of this will be making up 15 days of Affiliates points by parsing
traffic logs for moz.com and download.m.o.  How long would you estimate this
will take?  10min? an hour?"

Thanks!
I'm not really sure how to answer, because it really all depends on how fast the log processing script is.  There are about 100G of data per day for mozilla.com and about 30G a day for download.mozilla.org, so ~1.5 terabytes of mozilla.com and ~500G of download.mozilla.org data.  The stats scripts I have seen have all been CPU bound or database bound.
download.mozilla.org logs are ~100,000,000 lines a day.  I'll run a line count on mozilla.com logs as well.
(Assignee)

Updated

8 years ago
Assignee: server-ops → aravind
Status: REOPENED → ASSIGNED
(Reporter)

Comment 33

8 years ago
ok, launch process is...

Log parsing...

1)  there is a cron job running, that parses moz.com and dmo logs, called run.pl.  This cron job needs to be turned off for now.
2)  pull the newest version of http://svn.mozilla.org/projects/spreadfirefox.com/logs-parse/affiliates-download-counting/
3)  configure this script with config.pl-dist, and run with `perl run.pl > out.sql`

the config should be self-explanatory, please ask me if you're confused.  days should be 15 (we are parsing 15 days worth of logs)


Drupal launch...

1)  redirect spreadfirefox.com to the outage page, and give us the info for back-door access the site
2)  backup the DB
3)  svn up
4)  run manual-prereqs.sql on the DB
5)  I'll take some time to change a few things manually
6)  updates.php needs to be run, I can do this
7)  When I give you the word, run cron.php
8)  When I give you the word, run updates.sql on the DB
9)  mary and I will manually update content, etc
10)  when we give the word, flip spreadfirefox.com back to the actual site
(Reporter)

Comment 34

8 years ago
Created attachment 375283 [details]
updates.sql

for step #8
Attachment #371896 - Attachment is obsolete: true
(Reporter)

Comment 35

8 years ago
Created attachment 375284 [details]
manual-prereqs.sql

for step #4
(Assignee)

Comment 36

8 years ago
Done.
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago8 years ago
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.