Closed Bug 1172739 Opened 9 years ago Closed 9 years ago

Often getting an HTTP 500 Internal Server Error thrown from http://hg.mozilla.org/mozilla-central/summary

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

defect
Not set
major

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: stephend, Unassigned)

References

()

Details

Attachments

(1 file)

Attached image hg-zeus-500.png
Often, when I reload http://hg.mozilla.org/mozilla-central/summary, I get an HTTP 500 thrown.

See http://www.webpagetest.org/result/150609_9B_1FZ/ and the attached screenshot.
Yeah, we've been looking at this in #vcs for a little while now. Not sure what's going on.
The 500s started appearing in high frequency on 2015-06-08 16:14 UTC. That's 09:14 PDT.
Ruling it out, we didn't do any production deployments to hg.mozilla.org around the time the 500 frequency jumped up.
The exceptions in the server process look like http://bz.selenic.com/show_bug.cgi?id=4451.
I don't see any pushes that obviously correlate to an increase in http 500.
Around the time the 500 rate increased (1614 UTC/0914 PDT):

* There were no obvious pushes (closest was https://hg.mozilla.org/integration/fx-team/pushloghtml?changeset=a733c1ca6e55)
* No Puppet runs were recorded
* No Ansible deploys were performed

Since this kinda/sorta looks like an upstream Mercurial bug, my guess is some new client behavior started tickling it somehow. I don't have any leads at this point.
Oh, I'm pretty sure I haven't seen any errors for wire protocol requests: this is only impacting html/json pages. It may impact pushlog requests. Although it's difficult to differentiate which pushlog requests are important versus not.
It turns out this is a 3+ year old bug in Mercurial (since version 2.2)! It is triggered by making a specific type of request to the server. It just so happens that we had an agent yesterday making many of these requests, triggering the bug.

The Mercurial maintainer seems to have taken an interest in this bug and I suspect a patch will be produced shortly.

As a workaround, we should add the following to our vhost config:

  WSGIApplicationGroup %{GLOBAL}
I deployed the WSGIApplicationGroup %{GLOBAL} workaround yesterday (shortly after comment #9 was posted) and confirmed from logs this morning that we aren't seeing any more RevlogError.

I also submitted a patch to Mercurial to get this fixed. https://selenic.com/pipermail/mercurial-devel/2015-June/071051.html

I'm going to call this fixed, since there is nothing left for us to do since the workaround is sufficient.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Verified FIXED; I haven't seen this since.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: