Closed Bug 468480 Opened 16 years ago Closed 16 years ago

hg.mozilla.org very slow or nonfunctional (pushlog db problems?)

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dbaron, Assigned: aravind)

Details

hg.mozilla.org seems to be having serious problems right now: http://hg.mozilla.org/mozilla-central/pushloghtml doesn't load in any reasonable amount of time When I pushed 15 minutes ago, it took at least half a dozen tries. The first bunch all failed with a bunch of errors ending with: remote: pysqlite2.dbapi2.OperationalError: database is locked abort: unexpected response: empty string (Sorry, I don't have the rest of the errors anymore; it was a python exception stack, or something like that.)
I'm seeing hgweb be nonfunctional, even when not hitting the pushlog. I suspect this is not the pushlog's fault, perhaps something is spidering hgweb?
(But note that pushing to my user repo is fine, so it's mozilla-central specific or something like that.)
(Then again, the web interface for my user repo is not fine.)
That would lead me to believe that something is spidering hg.mozilla.org/mozilla-central/{pushlog,pushloghtml,json-pushes}. That would cause lots of queries against the pushlog.db, which would keep it locked, and also lots of load on hgweb. (The actual HG server is on a separate machine, so it's probably unaffected, aside from the pushlog db getting locked by read-only queries.)
Working on it.
Assignee: server-ops → aravind
This should be cleared up now. Not sure what triggered it in the first place (I had to reboot the boxes).
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
This looks broken again. Aravind: I think there must be some real root cause here. I still suspect spidering, but clearly I have no way to verify that.
Assignee: aravind → server-ops
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee: server-ops → thardcastle
It's been up and down all night because of load. I'm still trying to get it stable.
Should be more stable now. One of the vmware hosts (pm-vmware02) involved here lost it and dm-vcview02 went missing/unresponsive and couldn't be reset, overloading dm-vcview01. Handing off to aravind for more investigation. The load is still high, there are some outside ips hitting it frequently but no one really spidering it that I could tell.
Assignee: thardcastle → aravind
As trevor said, I couldn't find anyone spidering it either. I will leave the bug open an try to find more. Can you guys tell me when this started happening? At what time did you notice the slow down?
11:46 < Pike> is hg.m.o slow-up-to-dead for other folks, too? 11:47 < djc> Pike: yeah, looks like it (That's CET.)
Not sure if this will help but after about 10 minutes I got this: Proxy Error The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET /. Reason: Error reading from remote server Apache/2.2.3 (Red Hat) Server at hg.mozilla.org Port 80
This seems to be a problem with the ESX servers hosting the VM. We are looking at it. Will update the bug once we figure out whats going on with it.
We now have a case open with vmware. For now, I am routing this traffic to a different server, so hgweb should work okay. I will move it back to the VMs once we have resolution on the vmware problems.
This should be fixed now. Please re-open as necessary. We migrated the VMs off a problem netapp, so hopefully we won't see these problems again.
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.