Closed Bug 1350007 Opened 8 years ago Closed 8 years ago

zlb healthcheck for reviewboard-hg doesn't fully consume response

Categories

(MozReview Graveyard :: Infrastructure, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Assigned: fubar)

References

Details

As part of investigating bug 1338530, I discovered that the zlb health check monitoring the / endpoint on reviewboard-hg.mozilla.org is sending a TCP RST packet before fully consuming the HTTP response. This confuses httpd's logging and results in it recording the HTTP response as a 500 instead of 200 (which is sent over the wire). It appears the zlb health check for reviewboard-hg is configured to only care about the first 2048 bytes of the response. The zlb health check for hg.mozilla.org is a custom check using some Perl script. Since reviewboard-hg is essentially the same as hg.mozilla.org, could we switch reviewboard-hg to the same health check as hg.mozilla.org? This /shouldn't/ matter for bug 1338530. But I'd like to rule out the possibility that premature response truncation is causing the server to get in a bad state.
There's also a race between the zlb sending the RST and the origin server finishing the HTTP response and closing the TCP connection. This is because the zlb issues the RST after receiving 2kb.
To elaborate on my concern with regards to bug 1338530, I'm worried that as part of Apache discarding the RST TCP socket that its internal state tracking active TCP connections gets "corrupted" and that influences a latter HTTP request. From what I remember about the internals of httpd from hacking on C extensions 8+ years ago, this is in the realm of possibility. Especially since mod_wsgi maintains its own process pool and each process maintains state to communicate with the parent httpd process. If a WSGI request is interrupted due to remote RST and the connection tracking in mod_wsgi gets out-of-whack, we could see effects on subsequent requests.
(In reply to Gregory Szorc [:gps] from comment #0) > >The zlb health check for hg.mozilla.org is a custom check using some Perl script. python, but yes. I don't remember the origin of it, though. > Since reviewboard-hg is essentially the same as hg.mozilla.org, could we > switch reviewboard-hg to the same health check as hg.mozilla.org? Switched the health monitor from "Full HTTP" to "http-external (hg)"
Confirmed that we're no longer seeing the reported HTTP 500's in the httpd logs. Closing. I hope this doesn't fix things because I don't want to go down the rabbit hole that would require.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Assignee: nobody → klibby
You need to log in before you can comment on or make changes to this bug.