Closed Bug 1118108 Opened 10 years ago Closed 10 years ago

Deploy and test a single node for sync1.1 storage version 1.18-1

Categories

(Cloud Services :: Operations: Deployment Requests - DEPRECATED, task)

task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: rfkelly, Assigned: bobm)

References

Details

Attachments

(1 file)

In preparation for sync1.1 EOL, we need to deploy and test the EOL-header-sending code from Bug 1110013. For a variety of hysterical raisins (e.g. no surviving build environment, no stage hardware) we can't do this like a standard deploy so we'll have to special-case it. The plan, per discussion with :bobm: * I've tagged rpm-1.18-1 on http://hg.mozilla.org/services/server-storage/ with the EOL-header-sending code and no other changes. * Rather than building a fresh RPM, we'll take the existing rpm-1.17-2 rpms and rebuild them to incorporate the updated code and version number. This involves replacing only the syncstorage python code - no dependencies need to be touched and nothing needs to be recompiled. * For testing, we take a single production webhead out of the loadbalancer and push the new code to it. We can then use loadbalancer rules to direct specific user accounts to that node for some good old-fashioned manual QA. To QA this thing we'll need to try it with various combinations of flags and client behaviour: * sanity-check that no headers are sent by default * with eol_general_release off, try it with nightly versus other browsers and confirm that we only see the headers on nightly * with eol_general_release on, try it with various browsers and ensure they all get the header * use accounts with known uids to test the uid-based rollout percentage logic at different percentages * walk through a rollback of the rollout percentage, so that we can check whether the high-water-mark logic is working correctly. We can flesh out the details of each of these checks as they come up. Any others I've missed at this high level?
QA Contact: kthiessen
This looks good to me; I'll set up a short meeting this week in the Australian morning/PST afternoon with :rfkelly and :bobm to walk through various bits and ask questions.
The current installed version of the Sync 1.1 is syncstorage-1.15-7.
There's a lot of churn between that and the current tag, but it looks like it's all irrelevant to our change. I will branch from current release tag and apply the patches there, creating a new release in the 1.15 series.
76gzqkmx.Migrate001/prefs.js:user_pref("services.sync.username", "ntl7liwvfw3lwekjdjqjl3cbsz5txy2s"); a2ntf8tk.Migrate002/prefs.js:user_pref("services.sync.username", "zcbzhlho2jti2ucokjlcevfrt2vn3x7o"); s1qxnvr0.Migrate004/prefs.js:user_pref("services.sync.username", "6nqb637fllwn7rjzzi56syn5pums2u3m"); wia205vk.Migrate006/prefs.js:user_pref("services.sync.username", "p55rhekayiqjbvsgp3kgjmt3wrjagkhm"); zp5568j8.Migrate005/prefs.js:user_pref("services.sync.username", "os5k46bzyope4tbxvueo445ypx7vuzvb");
(In reply to Karl Thiessen [:kthiessen] from comment #4) Those usernames translate to the following UIDs: ntl7liwvfw3lwekjdjqjl3cbsz5txy2s 13479301 zcbzhlho2jti2ucokjlcevfrt2vn3x7o 13479305 6nqb637fllwn7rjzzi56syn5pums2u3m 13479308 os5k46bzyope4tbxvueo445ypx7vuzvb 13479311 p55rhekayiqjbvsgp3kgjmt3wrjagkhm 13479313
Depends on: 1119110
Alrighty, I have made a "rpm-1.15-9" that backports Bug 1110013 and Bug 1119110 onto the latest release branch. There was an rpm-1.15-8 version that was never released, but it contains only a trivial change which I'm happy to ship as part of this. Bob, over to you for an attempt at building these RPMs.
The python26-syncstorage-1.15-9.noarch.rpm has been built. I'll detail the process here because of the slight tangent to orthodoxy in the build process. However, this ensures the resulting RPM is as close to the current running environment as possible. 1. The python26-syncstorage-1.15-7.noarch.rpm was disassembled using a combination of rpm2cpio and cpio -imvd. 2. The spec file was recovered using the rpmrebuild command. 3. The spec file was edited to change the current release and buildroot. 4. The four files that were changed in the rpm-1.15-9 tag were copied over. 5. The python byte code files for those files were removed, and regenerated using the py_compile module. 6. Everything was repackaged using rpmbuild -bb.
Assignee: nobody → bobm
Status: NEW → ASSIGNED
(In reply to Bob Micheletto [:bobm] from comment #5) > (In reply to Karl Thiessen [:kthiessen] from comment #4) > Those usernames translate to the following UIDs: > > ntl7liwvfw3lwekjdjqjl3cbsz5txy2s 13479301 > zcbzhlho2jti2ucokjlcevfrt2vn3x7o 13479305 > 6nqb637fllwn7rjzzi56syn5pums2u3m 13479308 > os5k46bzyope4tbxvueo445ypx7vuzvb 13479311 > p55rhekayiqjbvsgp3kgjmt3wrjagkhm 13479313 As discussed in an ad-hoc meeting between :kthiessen, :rfkelly, and I the associated UIDs are quite serendipitous for testing because an EOL probability setting of 10% should send EOL headers to the first three, and not send them to the last two.
Change window planned for 2013/01/13 at 16:00 PST. See plan here: https://mana.mozilla.org/wiki/display/SVCOPS/CW-20150113+-+bobm
(In reply to Bob Micheletto [:bobm] from comment #9) > Change window planned for 2013/01/13 at 16:00 PST. See plan here: > https://mana.mozilla.org/wiki/display/SVCOPS/CW-20150113+-+bobm Basic deployment worked, though getting the following traceback: 2015/Jan/13:17:30:58 -0800] f833899b1ca8ec7a612c50dad3128a78 [2015/Jan/13:17:30:58 -0800] Uncaught exception while processing request: GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/info/collections File "/usr/lib/python2.6/site-packages/services/util.py", line 304, in __call__ return self.app(environ, start_response) File "/usr/lib/python2.6/site-packages/paste/translogger.py", line 68, in __call__ return self.application(environ, replacement_start_response) File "/usr/lib/python2.6/site-packages/webob/dec.py", line 147, in __call__ resp = self.call_func(req, *args, **self.kwargs) File "/usr/lib/python2.6/site-packages/webob/dec.py", line 208, in call_func return self.func(req, *args, **kwargs) File "/usr/lib/python2.6/site-packages/services/baseapp.py", line 225, in __notified response = func(self, request) File "/usr/lib/python2.6/site-packages/services/baseapp.py", line 259, in __call__ response = self._dispatch_request(request) File "/usr/lib/python2.6/site-packages/services/baseapp.py", line 317, in _dispatch_request response = self._dispatch_request_with_match(request, match) File "/usr/lib/python2.6/site-packages/syncstorage/wsgiapp.py", line 291, in _dispatch_request_with_match userid % 100 < self.eol_rollout_percent) <type 'exceptions.TypeError'> TypeError('not all arguments converted during string formatting',)
Added quick in-place hack from :rfkelly to see if it resolves the problem by converting the userid portion from line 273 to an integer using int() in wsgiapp.py. :kthiessen please try syncing when possible to test the fix.
It looks like request.user["userid"] is a string in prod, likely due to some combination of different config and older dependencies. I went ahead and pushed the obvious fix of coercing it to an integer: http://hg.mozilla.org/services/server-storage/rev/105b45e99434 (This is the "quick in-place hack" that Bob refers to above)
Getting closer. A new, but related, traceback has emerged. [2015/Jan/14:09:33:07 -0800] aa38e496a0dc0392eabcd45ffc9843e1 [2015/Jan/14:09:33:07 -0800] Uncaught exception while processing request: GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/info/collections File "/usr/lib/python2.6/site-packages/services/util.py", line 304, in __call__ return self.app(environ, start_response) File "/usr/lib/python2.6/site-packages/paste/translogger.py", line 68, in __call__ return self.application(environ, replacement_start_response) File "/usr/lib/python2.6/site-packages/webob/dec.py", line 147, in __call__ resp = self.call_func(req, *args, **self.kwargs) File "/usr/lib/python2.6/site-packages/webob/dec.py", line 208, in call_func return self.func(req, *args, **kwargs) File "/usr/lib/python2.6/site-packages/services/baseapp.py", line 225, in __notified response = func(self, request) File "/usr/lib/python2.6/site-packages/services/baseapp.py", line 259, in __call__ response = self._dispatch_request(request) File "/usr/lib/python2.6/site-packages/services/baseapp.py", line 317, in _dispatch_request response = self._dispatch_request_with_match(request, match) File "/usr/lib/python2.6/site-packages/syncstorage/wsgiapp.py", line 273, in _dispatch_request_with_match int(userid = request.user['userid']) <type 'exceptions.TypeError'> TypeError("'userid' is an invalid keyword argument for this function",)
(In reply to Ryan Kelly [:rfkelly] from comment #12) I tried doing the type conversion like this, which is no doubt not a very pythoniacal style: userid = request.user['userid'] userid = int(userid) Now getting the following tracebacks: @4000000054b6b6af21e0245c Traceback (most recent call last): @4000000054b6b6af21e037e4 File "/usr/bin/gunicorn", line 9, in <module> @4000000054b6b6af21e03fb4 load_entry_point('gunicorn==0.12.0', 'console_scripts', 'gunicorn')() @4000000054b6b6af21e04784 File "/usr/lib/python2.6/site-packages/gunicorn/app/wsgiapp.py", line 32, in run @4000000054b6b6af21e0533c WSGIApplication("%prog [OPTIONS] APP_MODULE").run() @4000000054b6b6af21e05b0c File "/usr/lib/python2.6/site-packages/gunicorn/app/base.py", line 131, in run @4000000054b6b6af21e22fcc Arbiter(self).run() @4000000054b6b6af21e22fcc File "/usr/lib/python2.6/site-packages/gunicorn/arbiter.py", line 178, in run @4000000054b6b6af21e26294 self.halt(reason=inst.reason, exit_status=inst.exit_status) @4000000054b6b6af21e26294 File "/usr/lib/python2.6/site-packages/gunicorn/arbiter.py", line 273, in halt @4000000054b6b6af21e42b9c self.stop() @4000000054b6b6af21e7581c File "/usr/lib/python2.6/site-packages/gunicorn/arbiter.py", line 317, in stop @4000000054b6b6af21ea78e4 self.reap_workers() @4000000054b6b6af21ede3e4 File "/usr/lib/python2.6/site-packages/gunicorn/arbiter.py", line 401, in reap_workers @4000000054b6b6af21f1bc44 raise HaltServer(reason, self.WORKER_BOOT_ERROR) @4000000054b6b6af21f38934 gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3>
(In reply to Bob Micheletto [:bobm] from comment #14) > (In reply to Ryan Kelly [:rfkelly] from comment #12) > > I tried doing the type conversion like this, which is no doubt not a very > pythoniacal style: > userid = request.user['userid'] > userid = int(userid) > After copying the wsgiapp.py file directly from the rpm-1.15-10 tag, as suggested by :rfkelly, this is now working. That does a more pythoniacal type conversion, like so: userid = int(request.user['userid']) However, it was expected that the following client would have been prompted to migrate, and was not: 10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:02 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/info/collections HTTP/1.1" 200 252 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.944 10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:03 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/meta/global HTTP/1.1" 200 667 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.908 10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:05 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/meta/global HTTP/1.1" 200 667 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.082 10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:05 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/crypto/keys HTTP/1.1" 200 407 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.108 10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:06 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/info/collections?v=1.40.0 HTTP/1.1" 200 252 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.854 10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:06 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/clients?full=1 HTTP/1.1" 200 1000 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.120 10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:07 -0800] "POST /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/clients HTTP/1.1" 200 70 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.126 10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:07 -0800] "POST /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/prefs HTTP/1.1" 200 110 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.199 10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:07 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/tabs?full=1 HTTP/1.1" 200 686 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.077 10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:07 -0800] "POST /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/addons HTTP/1.1" 200 214 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.145 10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:07 -0800] "POST /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/history HTTP/1.1" 200 214 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.123 10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:56:07 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/meta/fxa_credentials HTTP/1.1" 404 154 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.090 10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:57:38 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/info/collections HTTP/1.1" 200 252 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.063 10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:57:38 -0800] "POST /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/tabs HTTP/1.1" 200 70 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.073 10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:57:38 -0800] "POST /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/history HTTP/1.1" 200 70 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.084 10.10.14.209 - ntl7liwvfw3lwekjdjqjl3cbsz5txy2s [14/Jan/2015:12:57:38 -0800] "GET /1.1/ntl7liwvfw3lwekjdjqjl3cbsz5txy2s/storage/meta/fxa_credentials HTTP/1.1" 404 154 "-" "Firefox/38.0a1 FxSync/1.40.0.20150114030202.desktop" XFF="63.245.221.32" TIME=0.077
(In reply to Bob Micheletto [:bobm] from comment #15) > (In reply to Bob Micheletto [:bobm] from comment #14) > > (In reply to Ryan Kelly [:rfkelly] from comment #12) > However, it was expected that the following client would have been prompted > to migrate, and was not: Turns out that client had two client records, and was therefore precluded from the migration. After setting the EOL general release setting to true, the client was prompted for migration. :kthiessen like to add anything else?
The 1.15-10 RPM has been cobbled together and installed on the staging server. I've restarted gunicorn with a HUP signal, so hopefully that avoids the socket restart dance we were running into before. :kthiessen, please test that this version works with a sync or two.
Done. Looks good, Bob. Thank you.
Here's a screen capture showing the new Sync 1.1 metrics related to the migration from the testing done yesterday.
Closing this bug out as a success.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Marking this as VERIFIED to indicate success.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: