Closed
Bug 452102
Opened 17 years ago
Closed 17 years ago
l10n.nl.mozilla.org spiders mercurial too quickly
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: chizu, Assigned: benjamin)
References
Details
Attachments
(1 file)
|
2.51 KB,
patch
|
bhearsum
:
review+
|
Details | Diff | Splinter Review |
Upwards of 30 simultaneous requests at once are overloading hg.mozilla.org daily.
Example request:
GET /l10n-central/en-GB/index.cgi/pushlog HTTP/1.0" 200 34927 "-" "Twisted PageGetter
| Assignee | ||
Comment 1•17 years ago
|
||
What's wrong with 30 simultaneous requests?
| Assignee | ||
Comment 3•17 years ago
|
||
What does "takes down hg" mean?: other connections are slow for a few msecs/secs/minutes? You have to manually restart some process?
The pushlog requests used to be pretty trivial database queries... Ted, now that we're requesting file information and other stuff for the feed, is the request a lot more involved?
Comment 4•17 years ago
|
||
I expect to optimize out the file stuff as soon as we get bug 449381 fixed, at which point the responses should in general be empty and only touch the db, and not the repo at all, including generating the file lists.
Comment 5•17 years ago
|
||
PS: we resolved bug 443600 as WORKSFORME, which has more ideas, with harder deps on the server side.
Comment 6•17 years ago
|
||
To reiterate what Benjamin said, unless we know what to optimize for, it's hard to invest cycles in the right thing. So more data would be valuable.
A request to many locales at hg.mozilla.org/l10n-central/LOCALE/index.cgi/pushlog is made at once. Requests start to time out from high server load and IT is paged. It recovers a few minutes later.
We can watch for which locales are causing the most load next time it happens, in case that's useful.
Comment 8•17 years ago
|
||
Would also be helpful to know which parts of hg/hgweb are taking a lot of time. We might be able to optimize it, which I think we haven't done much with hgweb since few people seem to need it (but of course this is mostly the pushlog, which is a Mozilla thing).
Comment 9•17 years ago
|
||
I am working on getting the wsgi interface for mercurial to function correctly. This should be able to handle a bunch of simultaneous requests correctly. Will comment here once things start to work faster.
Updated•17 years ago
|
Assignee: nobody → server-ops
Component: Infrastructure → Server Operations
Product: Mozilla Localizations → mozilla.org
QA Contact: infrastructure → mrz
Version: unspecified → other
Updated•17 years ago
|
Assignee: server-ops → aravind
Comment 10•17 years ago
|
||
Aravind: I'd be interested to hear what you're changing. Are you moving to Apache's mod_wsgi? That is AFAIK the most performant solution for deploying WSGI applications (such as hgweb).
Comment 11•17 years ago
|
||
(In reply to comment #10)
> Aravind: I'd be interested to hear what you're changing. Are you moving to
> Apache's mod_wsgi? That is AFAIK the most performant solution for deploying
> WSGI applications (such as hgweb).
>
Yup, I am moving to mod_wsgi, I am not sure if its from Apache, the code I have is from http://code.google.com/p/modwsgi/. I don't think its an official apache module. It should be ready sometime tomorrow.
Comment 12•17 years ago
|
||
I don't mean *from* Apache, but wanted to distinguish it from nginx's mod_wsgi (which might be another option), and I think there's one more now.
Are you also getting rid of the "index.cgi" in the URL's, or is that now set in stone forever?
| Assignee | ||
Comment 13•17 years ago
|
||
On my buildbot I have taken the following customization which throttles the calls so there are never more than 4 simultaneously: http://hg.mozilla.org/users/bsmedberg_mozilla.com/buildbotcustom/rev/b8da2f63697c
Comment 14•17 years ago
|
||
I was wondering today if it would help if there were changegroup hooks instead of pollers (not sure if the pollers actually poll?).
Comment 15•17 years ago
|
||
Here is what I did this morning:
I removed the index.cgi from my links, to remove the redirect requests. That didn't help a whole lot.
I backed off my buildmaster from 3 to 5 minutes, which at least over the day today my time seemed to help a bit.
I have more ideas in my head, nameley, moving the polls on the individual feeds into the LoopingCall, with some statistics on the response time and a given timeout to getPage. But that's a different bug.
Re comment 14, I don't see how changegroup hooks would help, as we need to bridge the gap between the hg server and the buildbot masters. The main point here is that we want to keep the server up, and as many masters asking for changes as we see fit. The changegroup hook method basically says that we need to change the hook setup each time we set up a staging environment or something. That doesn't sound too scalable.
Comment 16•17 years ago
|
||
You could have the hook ping some script and have that script start all kinds of stuff (and more easily modify the script)? At least you're not polling every few minutes, then. And you could put some of the intelligence in the intermediary script, so you can select what buildbots to start/whether you want to start any.
Maybe introducing an extra layer makes it too complex, though.
Comment 17•17 years ago
|
||
Nah, that ain't gonna work. We should make polling cheap first.
Then we can load balance the polling. And then we can talk about some mirror. Or about aggregating the l10n feeds on the server side.
Comment 18•17 years ago
|
||
Is the new RR setup holding up or are you guys still noticing problems pulling in the multiple l10n trees at the same time?
Comment 19•17 years ago
|
||
I ran into a stale poller again, I'll need to add timeouts.
Comment 20•17 years ago
|
||
patch in preparation, I got timeouts working and de-parallized the l10n poller.
Polling all locales in parallel makes the individual queries take up to 6 seconds to respond, when I'm moving that down to two parallel requests, they take somewhere between 1 and 2 seconds.
I still need to set up my new hardware to be actually able to really test this locally, though.
| Assignee | ||
Comment 21•17 years ago
|
||
This patch should fix the problem at hand as well as bug 453457 (HgPoller sticking without an errback).
Updated•17 years ago
|
Attachment #337473 -
Flags: review?(bhearsum) → review+
| Assignee | ||
Comment 22•17 years ago
|
||
Fixed in CVS.
Assignee: benjamin → nobody
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Component: Server Operations → Release Engineering
QA Contact: mrz → release
Resolution: --- → FIXED
Updated•14 years ago
|
Assignee: nobody → benjamin
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•