Closed Bug 763105 Opened 12 years ago Closed 12 years ago

QA and deploy BrowserID train-2012.06.08 to production

Categories

(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: lhilaiel, Assigned: gene)

References

Details

(Whiteboard: [qa+])

Version: c3b7a57df7 (0.2012.06.08.1) branch train-2012.06.08

Tests pass: http://travis-ci.org/#!/mozilla/browserid/builds/1572639

ChangeLog including issues resolved: 
https://github.com/mozilla/browserid/blob/train-2012.06.08/ChangeLog#L1-16

[QA] Suggested additional areas of focus for QA:
  * extensive testing of layout on all devices given rebrand
  * previous, but with a verbose language, like german

[ops] deployment issues:
  * a new process has been added - "router" - this process must be started up on webheads!
  * a new statsd counter has been added 'browserid.<process type>.uncaught_exception' - we should add a new graph for it!
Please update http://l10n-preview.diresworb.org/ codebase to the new train.

@mathjazz - new strings are ready for you to extract, please notify the L10n community.

@petef do you need a separate bug for this work?

Please give input on how we can streamline and automate the L10n tasks of a dev -> stage deployment.
(In reply to Lloyd Hilaiel [:lloyd] from comment #0)
> [ops] deployment issues:
>   * a new process has been added - "router" - this process must be started
> up on webheads!

Er, I was hoping we'd have a little bit of an ops review before this hit a deploy. How's this fit into the picture? Is it supposed to be where Zeus sends all browserid.org traffic? Does it reverse proxy to the rest of the stuff?

Should verifier traffic go straight to the verifier or to the router?

Is it ok if we initially keep the existing non-router architecture for this deploy?


(In reply to Austin King [:ozten] from comment #1)
> Please give input on how we can streamline and automate the L10n tasks of a
> dev -> stage deployment.

It's part of our stage push procedure.  https://intranet.mozilla.org/Services/Ops/BrowserID/CodePush#Staging
Assignee: petef → nobody
Assignee: nobody → gene
I've deployed this to stage. sweb1 and web1 seemed to be having heartbeat problems during the deploy. I'm not sure what impact this has.
I've added a fix for the mysql host configuration parameter, this was found as last train was deployed, but never merged back into dev.  I've scanned for other missed patches that went into the previous train and found none.

Version: 5d0bb6ee95 (0.2012.06.08.2) branch train-2012.06.08
5d0bb6ee95 has been deployed to stage. no heartbeat problems this time.
Depends on: 763586
Eugene - per Comment 1
https://bugzilla.mozilla.org/show_bug.cgi?id=763105#c1

Please upgrade the l10n preview server when you get a chance...

I see this on Stage:
5d0bb6e bump to 0.2012.06.08.2 with mysql host configuration fix
locale svn r106260

and this on l10n:
254b9f0 add host parameter back to mysql configuration - fixes accidental regression in commit 3f4368d2fde6f67075d218c06a393bcf80a9de67
locale svn r106267
(which appears to match the latest in Dev)

Actually, :ozten - which codebase do you want in l10n?
Whiteboard: [qa+]
I've created the puppet scripts and applied them to staging. router is now running and nginx is sending '/' to it. Go ahead and take a look and let me know if it looks ok. I'll start working on the l10n preview server now
I've updated l10n preview with 5d0bb6ee95. Take a look and let me know it is
l10n preview hadn't been done correctly when I last commented (puppet problems). These are now fixed and l10n preview looks good now. Feel free to test it.
Let me know when this looks good for deployment to production. I'm assuming this sort of thing would happen in a maintenance window later this week.
Not exactly!
The train in Stage follows a two-week cycle:
https://img.skitch.com/20120411-bxshbj9y14jxm5b3pse1g787ju.png
A bit out of date, but you get the idea.

So, we would want to deploy this to Production the afternoon of 6/20, assuming QA passes all testing between now and then.
Status: NEW → ASSIGNED
:lloyd can we get a formal list of changes due to router?
Example: processes and logging changes for web* boxes
(not sure what other services (sweb*, sign*) were impacted)

I don't really want to crawl all the code here:
https://github.com/mozilla/browserid/pull/1657

thanks.
jbonacci: absolutely.  first thing in the morning.
The /service/browserid-router/run file needs

export BROWSERID_FAKE_VERIFICATION=1

added and the router daemon restarted on the webheads. Gene, can you make that 
change please? Thanks.
more - 

This configuration is *ONLY* for the staging environment.

For background, by enabling this env var, you expose a wsapi that allows clients to remotely access verification tokens and skip the email loop, which makes it possible for us to do load testing.
Puppet scripts updated to reflect stage only env variable. Applied the new puppet config and restarted routers in stage.
QA signs off on this (non-shipping) train. 
- verifications done
- unit tests all pass
- load_gen tests run without server code failures:
  - still see occasional errors where the LB returns 5xx or ECONNREFUSED; not sure
    of the root cause for these. Will spend time this train to narrow those down.
  - load_gen was not posting to /verify with bundled certs so we haven't actually
    been forcing much work on the verifier-compute cluster (GH-1770)

The decision was made to derail this train to allow more time for supporting the new router process. Closing this bug as INCOMPLETE.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
Blocks: 767708
No longer blocks: 767708
With this release the content of include.js changes and so the nagios monitoring for the hash of that file needs to be updated. This change has been made in sysadmins r42023
Also updating watchmouse to the new monitoring hash sysadmins r42024
Correction, since this is only in stage watchmouse doesn't need to be updated. Reverting the change sysadmins r42028
You need to log in before you can comment on or make changes to this bug.