Closed Bug 851000 Opened 12 years ago Closed 12 years ago

please deploy browserid 0.2013.03.01 to production

Categories

(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: jrgm, Assigned: gene)

References

Details

sha: 5381817d1d325569ed6f5fcee771c1ff810af7b5 locale svn r113972 Changelog: https://github.com/mozilla/browserid/blob/train-2013.03.01/ChangeLog#L1-L11 Schedule: http://personatra.in/#2013.03.01 Which is available on r6 at ~jrgm/workspace/browserid/rpmbuild/RPMS/x86_64/browserid-server-0.2013.03.01-1.el6_113972.x86_64.rpm
12:51 scl2 drained
12:56 load balancers at scl2 drained to stop traffic at scl2 12:58 install started at scl2
1:06 scl2 updated please test and approve
Target sha from above is 5381817, locale svn r113972 From a client with *login.persona.org resolving to scl2: # for i in `seq 100`; do curl -s https://login.persona.org/ver.txt ; sleep 0.1; done | sort | uniq -c 100 5381817 Changes in 2013.03.01 100 locale svn r113972 # for i in `seq 100`; do curl -s https://login.persona.org/sign_in | egrep 'Current|dialog.js|dialog.css'; sleep 0.1; done | sort | uniq -c 100 <script src="https://static.login.persona.org/v/04e949c3a7/production/en/dialog.js"></script> 100 <link href="https://static.login.persona.org/v/844b6f3f18/production/ie8_dialog.css" rel="stylesheet" type="text/css"> 100 - Current Commit: https://github.com/mozilla/browserid/commit/5381817 100 <link href="https://static.login.persona.org/v/a7b294ecd6/production/en/dialog.css" rel="stylesheet" type="text/css"> # for i in `seq 40`; do curl -s https://login.persona.org/.well-known/browserid; echo; sleep 0.5; done | sort | uniq -c 40 {"public-key":{...}} - verifier is responding in scl2 to requests - language spot check - de, zh-TW, in particular this time - Can successfully signin/signout/signin with: https://mozillians.org/en-US/ https://developer.mozilla.org/en-US/ https://affiliates.mozilla.org/en-US/new https://marketplace.firefox.com/ https://moztrap.mozilla.org/results/runs/ https://id.etherpad.mozilla.org/ https://www.voo.st/events https://current.trovebox.com/ http://crossword.thetimes.co.uk/# http://int1.dolumar.com/ https://bugzilla.mozilla.org/index.cgi http://myfavoritebeer.org/ http://123done.org/ - completed basic functional flows (signup, signin, change password, forget password, add user, remove user, cancel account) - reviewed pencil graphs for errors or abnormal values. Looks nominally clear atm.
Gene, you can turn up traffic to SCL2 and bleed off PHX1. (But can we just watch the requests fade off there for about 10 minutes before closing the door).
1:49 traffic enabled at scl2 and drained at phx1
Having some issues with bugzilla persona login at the moment. Gene please move traffic back to phx1 now.
2:00 traffic re-enabled back at PHX1 (old code) and disabled at SCL2 after problems were seen with bugzilla persona logins
3:01 switched back to SCL2 to test again
3:04 switched back to PHX1 after again seein problems
3:27 scl2 code rolled back to 02.15 test and approve
Previous train-2013.02.15 was d217289 changes in 2013.02.15, locale svn r113164 adn were back on that in scl2. https://bugzilla.mozilla.org/show_bug.cgi?id=845897#c3 From a client with *login.persona.org resolving to scl2: # for i in `seq 100`; do curl -s https://login.persona.org/ver.txt ; sleep 0.1; done | sort | uniq -c 100 d217289 changes in 2013.02.15 100 locale svn r113164 # for i in `seq 100`; do curl -s https://login.persona.org/sign_in | egrep 'Current|dialog.js|dialog.css'; sleep 0.1; done | sort | uniq -c 100 <script src="https://static.login.persona.org/v/ff31d9b312/production/en/dialog.js"></script> 100 <link href="https://static.login.persona.org/v/844b6f3f18/production/ie8_dialog.css" rel="stylesheet" type="text/css"> 100 - Current Commit: https://github.com/mozilla/browserid/commit/d217289 100 <link href="https://static.login.persona.org/v/4013b92760/production/en/dialog.css" rel="stylesheet" type="text/css"> # for i in `seq 40`; do curl -s https://login.persona.org/.well-known/browserid; echo; sleep 0.5; done | sort | uniq -c 40 {"public-key":{...}} - verifier is responding in scl2 to requests - language spot check - de, zh-TW, in particular this time - Can successfully signin/signout/signin with: https://mozillians.org/en-US/ https://developer.mozilla.org/en-US/ https://affiliates.mozilla.org/en-US/new https://marketplace.firefox.com/ https://moztrap.mozilla.org/results/runs/ https://id.etherpad.mozilla.org/ https://www.voo.st/events https://current.trovebox.com/ http://crossword.thetimes.co.uk/# http://int1.dolumar.com/ https://bugzilla.mozilla.org/index.cgi http://myfavoritebeer.org/ http://123done.org/ - completed basic functional flows (signup, signin, change password, forget password, add user, remove user, cancel account) - reviewed pencil graphs for errors or abnormal values. Looks nominally clear atm.
Gene you can return traffic back to scl2.
4:08 traffic re-enabled at scl2. we're now rolled back code-wise and traffic is live in both DCs
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
To summarize the reason for the rollback: when we moved traffic into SCL2 and started to drain PHX1 problems were immediately noticed for signins to bugzilla.mozilla.org (but not to any of the other RPs that I tested). The signin (at the point where bugzilla needs to contact our Persona verifier service, now listed by dynect DNS to be in SCL2, would just just hang and after thirty seconds would return 500 with Service Unavailable page from Zues). Gene and others eventually traced this to an existing, partially known issue with Bugzilla. Root cause was an incorrect networking configuration that was preventing b.m.o from reaching Persona in the SCL2 datacenter since last Saturday I believe. There was no known issue with this train per se, and we can regroup and try this again on Monday.
Status: RESOLVED → VERIFIED
Status: VERIFIED → RESOLVED
Closed: 12 years ago12 years ago
Here is the ticket of the bugzilla root cause : Bug 851376
Depends on: 851751
So an issue was found that warranted an update to train-2013.03.01, and we will need to retest that build before deployment (probably not on Monday). I will file a new request to deploy to production when we have that build ready - see bug 851751 for the stage deployment of the updated branch.
Depends on: 852844
You need to log in before you can comment on or make changes to this bug.