Closed Bug 715728 Opened 13 years ago Closed 12 years ago

QA and deploy BrowserID train-2012.01.05 to production

Categories

(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task)

task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: lhilaiel, Assigned: petef)

References

Details

(Whiteboard: [qa?])

ChangeLog including issues resolved: 
https://github.com/mozilla/browserid/blob/train-2012.01.05/ChangeLog#L1-10

[QA] Suggested areas of focus for QA:
  * Any issues related to email stored in session should be found and re-tested (i.e. removing the email from your account that you authenticated with shouldn't log you out, removing the last email from your account should log you out, etc)
  * Full and complete regression testing across all devices.  There's a substantial amount of code that's changed, and minimal user visible change in this release (178 files changed, 9203 insertions(+), 2947 deletions(-))

[ops] Manual deployment steps

This baby has SCHEMA CHANGES!


   "CREATE TABLE IF NOT EXISTS user (" +
     "id BIGINT AUTO_INCREMENT PRIMARY KEY," +
-    "passwd CHAR(64) NOT NULL" +
+    "passwd CHAR(64)" +
     ") ENGINE=InnoDB;",
 
   "CREATE TABLE IF NOT EXISTS email (" +
     "id BIGINT AUTO_INCREMENT PRIMARY KEY," +
     "user BIGINT NOT NULL," +
     "address VARCHAR(255) UNIQUE NOT NULL," +
+    "type ENUM('secondary', 'primary') DEFAULT 'secondary' NOT NULL," +
     "FOREIGN KEY user_fkey (user) REFERENCES user(id)" +
     ") ENGINE=InnoDB;",
 
@@ -84,9 +86,10 @@ const schemas = [
     "id BIGINT AUTO_INCREMENT PRIMARY KEY," +
     "secret CHAR(48) UNIQUE NOT NULL," +
     "new_acct BOOL NOT NULL," +
-    "existing VARCHAR(255)," +
+    "existing_user BIGINT," +
     "email VARCHAR(255) UNIQUE NOT NULL," +
-    "ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL" +
+    "ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL," +
+    "FOREIGN KEY existing_user_fkey (existing_user) REFERENCES user(id)" +
     ") ENGINE=InnoDB;",
 ];

Suggested deployment order:

1. remove the NOT NULL clause on user.passwd
2. add email.type
3. add stage.existing_user as a foreign key on user
4. dereference stage.existing (an email) to the user id and update stage.existing_user
5. visually inspect correctness of #4
6. perform code push
7. wait for QA to validate
8. alter to remove the stage.existing column
Assignee: nobody → petef
No longer depends on: 711267
QA Contact: operations-deploy-requests → jbonacci
Along with the QA list up top, we are also going to be focusing on the following:
Short-term high ADU load test (4 - 8 hours)
Long-term lower ADU load test (2 day)
Android devices (2.x - 4.x) with emphasis on stock browser and 4 channels of FF.
iOS5 devices
Mac, Win, and Linux with 4 channels of FF
Win7 with IE9
WinXP with IE7 and IE8

We will be skipping other browsers for this release.
Looks like we will begin this work Monday morning PST after deployment to Stage.
This is deployed in stage.

SQL/data note:  email.type is now 'secondary' on all old rows.

push procedure:

# run these pt-online-schema-changes on slaves first, then master.

pt-online-schema-change D=browserid,t=user \
  --alter "modify passwd char(64)" \
  --child-tables email \
  --update-foreign-keys-method rebuild_constraints \
  --progress percentage,5

# we always have to re-add user_fkey because of a bug/feature
pt-online-schema-change D=browserid,t=email \
  --alter "add type ENUM('secondary', 'primary') DEFAULT 'secondary' NOT NULL; add foreign key user_fkey (user) references user(id);" \
  --progress percentage,5

pt-online-schema-change D=browserid,t=staged \
  --alter "add existing_user BIGINT; add foreign key existing_user_fkey (existing_user) references user(id)" \
  --progress percentage,5

# migrate data from staged.existing to staged.existing_user
MASTER|mysql> update staged, email, user set staged.existing_user=user.id
       where staged.existing_user is null and staged.existing is not null
       and staged.new_acct = 0 and staged.existing = email.address and
       email.user = user.id

### Push new code now.

# pick up any missed data
MASTER|mysql> update staged, email, user set staged.existing_user=user.id
       where staged.existing_user is null and staged.existing is not null
       and staged.new_acct = 0 and staged.existing = email.address and
       email.user = user.id

# --- we are here, in staging ---

# If things go well, eventually delete backups (slaves & master):
# mysql> drop table __old_user, __old_email, __old_staged

# and then drop the old existing field:

pt-online-schema-change D=browserid,t=staged \
  --alter "drop existing" \
  --progress percentage,5

# If things go well, eventually delete backups:
# mysql> drop table __old_staged
Status: NEW → ASSIGNED
Awesome work from OPs. QA picks this train up for testing in Stage.
Processes, logs, versions, and heartbeats all look good.
Moving on to bug verification and a 4-hour load test.
Started bug verification by updating most of the RP-related bugs on the RP GitHub site for myfavoritebeer.org (booze and show have 0 issues).
HotFix launched this evening:
Code change: 0.2012.01.05-2
15:39 < GitHub158> [browserid] lloyd pushed 2 new commits to train-2012.01.05: http://git.io/8IOV5w
15:39 < GitHub158> [browserid/train-2012.01.05] explicitly call .removeAllListeners() during http forwarding to eliminate memory leak.  closes #839 (with extreme prejudice) - Lloyd Hilaiel
15:39 < GitHub158> [browserid/train-2012.01.05] update version and ChangeLog with memory leak hotfix for train-2012.01.05 - Lloyd Hilaiel

https://diresworb.org/ver.txt
0051864 update version and ChangeLog with memory leak hotfix for train-2012.01.05

Starting up a load test with the following settings:
node bin/load_gen -s https://stage-browserid.services.mozilla.com -o -m 750000 -u 1/750
Bug verification is complete.
Testing continues today on Android devices.
Testing continues today using load_gen in order to debug an issue found:
875: load_gen shows decline in QPS over time
We are in pretty good shape for a last-minute sign off pending triage of the following issue:
875: load_gen shows decline in QPS over time
Based on lloyd's comments to 875, QA signs off on this train.
Handing over to petef for deployment to Prod this afternoon.
pushed. leaving bug open for schema change cleanup (dropping backup tables and staged.existing).
QA signs off on the push to production.
Blocks: 719243
Whiteboard: [qa+]
Whiteboard: [qa+] → [qa?]
staged.existing finally dropped
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
Blocks: 723755
No longer blocks: 723755
You need to log in before you can comment on or make changes to this bug.