Closed Bug 825766 Opened 12 years ago Closed 12 years ago

configure Mozpool Apache frontend to not limit connection count

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

Details

Attachments

(1 file)

I'm not entirely sure what's going on, but when using Mozpool via Apache, things move a bit slower.  If I talk directly to the mozpool daemon, things are faster.  Apache seems to add some delay - perhaps because its backend connection queue is full, or as a DDOS protection, or something.

This is also causing some 502's for the mozharness script - see bug 825727.
I was able to replicate this with

---
while true; do
  curl http://mobile-imaging-002.p2.releng.scl1.mozilla.com/api/device/panda-0101/status/
  date
done
---

it's very clear that it runs at speed for about a half-dozen requests, then takes a few seconds each for the next requests.

I was able to "fix" this by reconfiguring Apache with `ProxyPass .. disablereuse=On`, but then I *reverted* this setting and it was still fixed.  So there's still some mystery here.
OK, I can replicate this as follows:

On a clean httpd, this runs all requests in the low hundreds of ms:

  ab -n 2000 -c 10 http://mobile-imaging-002.p2.releng.scl1.mozilla.com/api/device/panda-0101/state/

but running

  ab -n 2000 -c 20 http://mobile-imaging-002.p2.releng.scl1.mozilla.com/api/device/panda-0101/state/

will put Apache in a mode where some requests take >10s, and even running the `-c 10` version will show these long requests.

NOTE: I used /state/ because it caches data locally and doesn't hit the DB for every request.

Adding 'disablereuse=On' to the proxy worker config prevents this terrible behavior.  Instead, up to a "reasonable" concurrency (4), I get <30ms maximum.  With high concurrency (20), the median is still in the low 10's of ms, but with a few 3000s requests.  I assume that this is a 3s timeout in either Apache or web.py when the connection pool is exhausted.

Another fix is to add

 SetEnv force-proxy-request-1.0 1

as suggested in http://httpd.apache.org/docs/2.2/mod/mod_proxy.html.  This "fixes" the problem the same way disablereuse=On does.  So, I'm guessing that web.py has a bug in its implementation of HTTP/1.1 and connection re-use.
Attached patch bug825766.patchSplinter Review
Attachment #697170 - Flags: review?(jwatkins)
Attachment #697170 - Flags: review?(jwatkins) → review+
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: