Closed Bug 976907 Opened 10 years ago Closed 10 years ago

Ensure that RemoteVerifier uses proper connection pooling

Categories

(Cloud Services Graveyard :: Server: Token, defect, P2)

defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: rfkelly, Assigned: rfkelly)

Details

(Whiteboard: [qa+])

Attachments

(2 files)

I suspect that the tokenserver.verifiers.RemoteVerifier class does not use any connection pooling, meaning it spins up a new SSL connection for each incoming request, meaning badness.
Attached image 3pnw0.png
I see this in the paster output when doing a load test.
Ideally, we'd see a handful of these log lines at startup and then the same connections would be used over and over.  Does it appear we're basically seeing one for each tokenserver request?
Flags: needinfo?(bwong)
OS: Windows 7 → All
Hardware: x86_64 → All
Patch makes RemoteVerifier use a requests session, which does connection pooling automagically.  I tested this with a locally-running server and some print statements, to confirm that it does indeed re-use connections for multiple verification requests.

(clearing needinfo since this patch does want I want in any case)
Attachment #8381914 - Flags: review?(telliott)
Flags: needinfo?(bwong)
:rfkelly

FYI, I installed browserid_verifier on localhost on the server. I reran the simple benchmark without an SSL and this is the profile: 

https://rpm.newrelic.com/accounts/315282/applications/3684299/profiles/75379
Checking the "whats new in python 2.7" document (http://docs.python.org/2/whatsnew/2.7.html) and grepping for "httlib" reveals this little gem:

"""
The default HTTPResponse class used by the httplib module now supports buffering, resulting in much faster reading of HTTP responses.
"""

So this could at least partly be a known issue in python2.6, where it reads the HTTP response byte-by-byte looking for newline characters in order to split up the HTTP headers.  This is precisely the "readline" method that shows up in the newrelic profiles.

I'm not sure if there's a viable workaround for this, but it would interesting to try it on python2.7 for comparison.  Benson, is python2.7 even a remote possibility for this deployment?
Getting python 2.7 running a more remote possibility than just running the verifier locally with a multi-core box. :) 

I'll see if I can get it running under python27 and if it makes any difference.
Results w/ python27: 

https://rpm.newrelic.com/accounts/315282/applications/3684299/profiles/75386

not surprisingly it is spending most of its time still in the SSL library. 

here is the python 2.6 result again: 

https://rpm.newrelic.com/accounts/315282/applications/3684299/profiles/75372
:rfkelly, so I put your patch in and it is *much* better. 

- it actually uses the connection pool. I see it spinning up a pool of 10 connections and no more are used
- the CPU on the server is at 30% with the load test (make bench) vs 100% before

The new profile: https://rpm.newrelic.com/accounts/315282/applications/3684299/profiles/75390
- it is still spending most of the time in SSL read *but* I bet before the box was busy with SSL handshakes. Asymmetric vs symmetric encryption I guess... 

I say let's patch it into the GH repo and I'll spin a new RPM and deploy a new TS stage stack
Stupid thought : since we're just *verifying*, can't we just set a firewall rule and use plain HTTP between the TS and the remote verifier ?
:tarek more coupling between stacks. More changes to ELBs, etc. 

If we go an HTTP route, it'd be faster to just install the verifier in the same box as the tokenserver and have them talk through http://127.0.0.1. I actually tested that and it works great too.
Attachment #8381914 - Flags: review?(telliott) → review+
:mostlygeek that was our roughly initial set up ;)
Whiteboard: [qa+]
committed in https://github.com/mozilla-services/tokenserver/commit/b55e9533835258f16bf1611521a9efc11aa2afbc
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Priority: -- → P3
Priority: P3 → P2
Should be good now since this is out in Stage/Prod and we have been load testing it...
Also verified in code.
Status: RESOLVED → VERIFIED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: