If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Deploy and test a single node for sync1.1 storage version 1.18-1

VERIFIED FIXED

Status

Cloud Services
Operations: Deployment Requests
VERIFIED FIXED
3 years ago
3 years ago

People

(Reporter: rfkelly, Assigned: bobm)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

3 years ago
In preparation for sync1.1 EOL, we need to deploy and test the EOL-header-sending code from Bug 1110013.  For a variety of hysterical raisins (e.g. no surviving build environment, no stage hardware) we can't do this like a standard deploy so we'll have to special-case it.

The plan, per discussion with :bobm:

  * I've tagged rpm-1.18-1 on http://hg.mozilla.org/services/server-storage/ with the EOL-header-sending code and no other changes.

  * Rather than building a fresh RPM, we'll take the existing rpm-1.17-2 rpms and rebuild them to incorporate the updated code and version number.  This involves replacing only the syncstorage python code - no dependencies need to be touched and nothing needs to be recompiled.

  * For testing, we take a single production webhead out of the loadbalancer and push the new code to it.  We can then use loadbalancer rules to direct specific user accounts to that node for some good old-fashioned manual QA.

To QA this thing we'll need to try it with various combinations of flags and client behaviour:

  * sanity-check that no headers are sent by default
  * with eol_general_release off, try it with nightly versus other browsers and confirm that we only see the headers on nightly
  * with eol_general_release on, try it with various browsers and ensure they all get the header
  * use accounts with known uids to test the uid-based rollout percentage logic at different percentages
  * walk through a rollback of the rollout percentage, so that we can check whether the high-water-mark logic is working correctly.

We can flesh out the details of each of these checks as they come up.  Any others I've missed at this high level?
QA Contact: kthiessen
This looks good to me; I'll set up a short meeting this week in the Australian morning/PST afternoon with :rfkelly and :bobm to walk through various bits and ask questions.
(Assignee)

Comment 2

3 years ago
The current installed version of the Sync 1.1 is syncstorage-1.15-7.
(Reporter)

Comment 3

3 years ago
There's a lot of churn between that and the current tag, but it looks like it's all irrelevant to our change.  I will branch from current release tag and apply the patches there, creating a new release in the 1.15 series.
76gzqkmx.Migrate001/prefs.js:user_pref("services.sync.username", "ntl7liwvfw3lwekjdjqjl3cbsz5txy2s");
a2ntf8tk.Migrate002/prefs.js:user_pref("services.sync.username", "zcbzhlho2jti2ucokjlcevfrt2vn3x7o");
s1qxnvr0.Migrate004/prefs.js:user_pref("services.sync.username", "6nqb637fllwn7rjzzi56syn5pums2u3m");
wia205vk.Migrate006/prefs.js:user_pref("services.sync.username", "p55rhekayiqjbvsgp3kgjmt3wrjagkhm");
zp5568j8.Migrate005/prefs.js:user_pref("services.sync.username", "os5k46bzyope4tbxvueo445ypx7vuzvb");
(Assignee)

Comment 5

3 years ago
(In reply to Karl Thiessen [:kthiessen] from comment #4)
Those usernames translate to the following UIDs:

ntl7liwvfw3lwekjdjqjl3cbsz5txy2s 13479301
zcbzhlho2jti2ucokjlcevfrt2vn3x7o 13479305
6nqb637fllwn7rjzzi56syn5pums2u3m 13479308
os5k46bzyope4tbxvueo445ypx7vuzvb 13479311
p55rhekayiqjbvsgp3kgjmt3wrjagkhm 13479313
(Reporter)

Updated

3 years ago
Depends on: 1119110
(Reporter)

Comment 6

3 years ago
Alrighty, I have made a "rpm-1.15-9" that backports Bug 1110013 and Bug 1119110 onto the latest release branch.  There was an rpm-1.15-8 version that was never released, but it contains only a trivial change which I'm happy to ship as part of this.

Bob, over to you for an attempt at building these RPMs.
(Assignee)

Comment 7

3 years ago
The python26-syncstorage-1.15-9.noarch.rpm has been built.  I'll detail the process here because of the slight tangent to orthodoxy in the build process.  However, this ensures the resulting RPM is as close to the current running environment as possible.

1. The python26-syncstorage-1.15-7.noarch.rpm was disassembled using a combination of rpm2cpio and cpio -imvd.
2. The spec file was recovered using the rpmrebuild command.
3. The spec file was edited to change the current release and buildroot.
4. The four files that were changed in the rpm-1.15-9 tag were copied over.
5. The python byte code files for those files were removed, and regenerated using the py_compile module.
6. Everything was repackaged using rpmbuild -bb.
Assignee: nobody → bobm
Status: NEW → ASSIGNED
(Assignee)

Comment 8

3 years ago
(In reply to Bob Micheletto [:bobm] from comment #5)
> (In reply to Karl Thiessen [:kthiessen] from comment #4)
> Those usernames translate to the following UIDs:
> 
> ntl7liwvfw3lwekjdjqjl3cbsz5txy2s 13479301
> zcbzhlho2jti2ucokjlcevfrt2vn3x7o 13479305
> 6nqb637fllwn7rjzzi56syn5pums2u3m 13479308
> os5k46bzyope4tbxvueo445ypx7vuzvb 13479311
> p55rhekayiqjbvsgp3kgjmt3wrjagkhm 13479313

As discussed in an ad-hoc meeting between :kthiessen, :rfkelly, and I the associated UIDs are quite serendipitous for testing because an EOL probability setting of 10% should send EOL headers to the first three, and not send them to the last two.
(Assignee)

Comment 9

3 years ago
Change window planned for 2013/01/13 at 16:00 PST.  See plan here: https://mana.mozilla.org/wiki/display/SVCOPS/CW-20150113+-+bobm
(Assignee)

Comment 10

3 years ago
(In reply to Bob Micheletto [:bobm] from comment #9)
> Change window planned for 2013/01/13 at 16:00 PST.  See plan here:
> https://mana.mozilla.org/wiki/display/SVCOPS/CW-20150113+-+bobm

Basic deployment worked, though getting the following traceback: 
2015/Jan/13:17:30:58 -0800] f833899b1ca8ec7a612c50dad3128a78
[2015/Jan/13:17:30:58 -0800] Uncaught exception while processing request:
GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/info/collections
  File "/usr/lib/python2.6/site-packages/services/util.py", line 304, in __call__
    return self.app(environ, start_response)
  File "/usr/lib/python2.6/site-packages/paste/translogger.py", line 68, in __call__
    return self.application(environ, replacement_start_response)
  File "/usr/lib/python2.6/site-packages/webob/dec.py", line 147, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "/usr/lib/python2.6/site-packages/webob/dec.py", line 208, in call_func
    return self.func(req, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/services/baseapp.py", line 225, in __notified
    response = func(self, request)
  File "/usr/lib/python2.6/site-packages/services/baseapp.py", line 259, in __call__
    response = self._dispatch_request(request)
  File "/usr/lib/python2.6/site-packages/services/baseapp.py", line 317, in _dispatch_request
    response = self._dispatch_request_with_match(request, match)
  File "/usr/lib/python2.6/site-packages/syncstorage/wsgiapp.py", line 291, in _dispatch_request_with_match
    userid % 100 < self.eol_rollout_percent)
<type 'exceptions.TypeError'>
TypeError('not all arguments converted during string formatting',)
(Assignee)

Comment 11

3 years ago
Added quick in-place hack from :rfkelly to see if it resolves the problem by converting the userid portion from line 273 to an integer using int() in wsgiapp.py.  :kthiessen please try syncing when possible to test the fix.
(Reporter)

Comment 12

3 years ago
It looks like request.user["userid"] is a string in prod, likely due to some combination of different config and older dependencies.  I went ahead and pushed the obvious fix of coercing it to an integer:

  http://hg.mozilla.org/services/server-storage/rev/105b45e99434

(This is the "quick in-place hack" that Bob refers to above)
(Assignee)

Comment 13

3 years ago
Getting closer.  A new, but related, traceback has emerged.

[2015/Jan/14:09:33:07 -0800] aa38e496a0dc0392eabcd45ffc9843e1
[2015/Jan/14:09:33:07 -0800] Uncaught exception while processing request:
GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/info/collections
  File "/usr/lib/python2.6/site-packages/services/util.py", line 304, in __call__
    return self.app(environ, start_response)
  File "/usr/lib/python2.6/site-packages/paste/translogger.py", line 68, in __call__
    return self.application(environ, replacement_start_response)
  File "/usr/lib/python2.6/site-packages/webob/dec.py", line 147, in __call__
    resp = self.call_func(req, *args, **self.kwargs)
  File "/usr/lib/python2.6/site-packages/webob/dec.py", line 208, in call_func
    return self.func(req, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/services/baseapp.py", line 225, in __notified
    response = func(self, request)
  File "/usr/lib/python2.6/site-packages/services/baseapp.py", line 259, in __call__
    response = self._dispatch_request(request)
  File "/usr/lib/python2.6/site-packages/services/baseapp.py", line 317, in _dispatch_request
    response = self._dispatch_request_with_match(request, match)
  File "/usr/lib/python2.6/site-packages/syncstorage/wsgiapp.py", line 273, in _dispatch_request_with_match
    int(userid = request.user['userid'])
<type 'exceptions.TypeError'>
TypeError("'userid' is an invalid keyword argument for this function",)
(Assignee)

Comment 14

3 years ago
(In reply to Ryan Kelly [:rfkelly] from comment #12)

I tried doing the type conversion like this, which is no doubt not a very pythoniacal style:
userid = request.user['userid']
userid = int(userid)

Now getting the following tracebacks:

@4000000054b6b6af21e0245c Traceback (most recent call last):
@4000000054b6b6af21e037e4   File "/usr/bin/gunicorn", line 9, in <module>
@4000000054b6b6af21e03fb4     load_entry_point('gunicorn==0.12.0', 'console_scripts', 'gunicorn')()
@4000000054b6b6af21e04784   File "/usr/lib/python2.6/site-packages/gunicorn/app/wsgiapp.py", line 32, in run
@4000000054b6b6af21e0533c     WSGIApplication("%prog [OPTIONS] APP_MODULE").run()
@4000000054b6b6af21e05b0c   File "/usr/lib/python2.6/site-packages/gunicorn/app/base.py", line 131, in run
@4000000054b6b6af21e22fcc     Arbiter(self).run()
@4000000054b6b6af21e22fcc   File "/usr/lib/python2.6/site-packages/gunicorn/arbiter.py", line 178, in run
@4000000054b6b6af21e26294     self.halt(reason=inst.reason, exit_status=inst.exit_status)
@4000000054b6b6af21e26294   File "/usr/lib/python2.6/site-packages/gunicorn/arbiter.py", line 273, in halt
@4000000054b6b6af21e42b9c     self.stop()
@4000000054b6b6af21e7581c   File "/usr/lib/python2.6/site-packages/gunicorn/arbiter.py", line 317, in stop
@4000000054b6b6af21ea78e4     self.reap_workers()
@4000000054b6b6af21ede3e4   File "/usr/lib/python2.6/site-packages/gunicorn/arbiter.py", line 401, in reap_workers
@4000000054b6b6af21f1bc44     raise HaltServer(reason, self.WORKER_BOOT_ERROR)
@4000000054b6b6af21f38934 gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3>
(Assignee)

Comment 15

3 years ago
(In reply to Bob Micheletto [:bobm] from comment #14)
> (In reply to Ryan Kelly [:rfkelly] from comment #12)
> 
> I tried doing the type conversion like this, which is no doubt not a very
> pythoniacal style:
> userid = request.user['userid']
> userid = int(userid)
> 

After copying the wsgiapp.py file directly from the rpm-1.15-10 tag, as suggested by :rfkelly, this is now working.  That does a more pythoniacal type conversion, like so:

userid = int(request.user['userid'])

However, it was expected that the following client would have been prompted to migrate, and was not: 
10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:02 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/info/collections HTTP/1.1" 200 252 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.944 
10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:03 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/meta/global HTTP/1.1" 200 667 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.908 
10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:05 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/meta/global HTTP/1.1" 200 667 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.082 
10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:05 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/crypto/keys HTTP/1.1" 200 407 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.108 
10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:06 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/info/collections?v=1.40.0 HTTP/1.1" 200 252 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.854 
10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:06 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/clients?full=1 HTTP/1.1" 200 1000 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.120 
10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:07 -0800] "POST /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/clients HTTP/1.1" 200 70 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.126 
10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:07 -0800] "POST /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/prefs HTTP/1.1" 200 110 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.199 
10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:07 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/tabs?full=1 HTTP/1.1" 200 686 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.077 
10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:07 -0800] "POST /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/addons HTTP/1.1" 200 214 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.145 
10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:07 -0800] "POST /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/history HTTP/1.1" 200 214 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.123 
10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:07 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/meta/fxa_credentials HTTP/1.1" 404 154 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.090 
10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:57:38 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/info/collections HTTP/1.1" 200 252 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.063 
10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:57:38 -0800] "POST /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/tabs HTTP/1.1" 200 70 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.073 
10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:57:38 -0800] "POST /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/history HTTP/1.1" 200 70 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.084 
10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:57:38 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/meta/fxa_credentials HTTP/1.1" 404 154 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.077
(Assignee)

Comment 16

3 years ago
(In reply to Bob Micheletto [:bobm] from comment #15)
> (In reply to Bob Micheletto [:bobm] from comment #14)
> > (In reply to Ryan Kelly [:rfkelly] from comment #12)

> However, it was expected that the following client would have been prompted
> to migrate, and was not: 

Turns out that client had two client records, and was therefore precluded from the migration.  After setting the EOL general release setting to true, the client was prompted for migration.  

:kthiessen like to add anything else?
(Assignee)

Comment 17

3 years ago
The 1.15-10 RPM has been cobbled together and installed on the staging server.  I've restarted gunicorn with a HUP signal, so hopefully that avoids the socket restart dance we were running into before.  :kthiessen, please test that this version works with a sync or two.
Done.  Looks good, Bob.  Thank you.
(Assignee)

Comment 19

3 years ago
Created attachment 8550003 [details]
Screen shot of new Sync 1.1 metrics related to migration in Graphite console.

Here's a screen capture showing the new Sync 1.1 metrics related to the migration from the testing done yesterday.
(Assignee)

Comment 20

3 years ago
Closing this bug out as a success.
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Marking this as VERIFIED to indicate success.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.