Closed Bug 400876 Opened 18 years ago Closed 15 years ago

sentry should employ threads

Categories

(Webtools :: Bouncer, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED
Future

People

(Reporter: wenzel, Assigned: justdave)

Details

(Whiteboard: [tx])

With the recent changes to support languages (bug 395232) and hash checking (bug 371237), sentry's work has become more complex, in particular, now it has to handle between 30,000 and 40,000 URLs in one single run, many of which on the same server. If we execute these tests sequentially, and there is even only one very slow server, this may take hours while we are waiting for the replies or timeouts. To avoid this problem, we need parallelism. The easiest way to do this is using threads, and feeding new URLs to whichever thread has just finished checking a file. If this is easily done in Perl, we should do so -- if it is a pain, we should consider rewriting sentry in, say, Python. Fortunately, sentry isn't all that complicated or big.
We had already talked about the possibility of backporting the work Lars did for Sentry2. Any thoughts on what we could reuse from that codebase?
Yes, I remember us talking about this. We may reuse quite a bit, I think. At the very least, we could use Lars' threading library, saving us quite a chunk of work right there. So my suggestion would be, if we can get threads implemented in the perl script relatively easily, we should do that. If this is more or equally as much work as backporting sentry2 (partly), we should do the latter.
I just hacked up a quick and dirty version of this just shelling out and backgrounding it. After getting sentry.pl to be able to check a specific mirror in bug 457612, I wrote a wrapper around that that grabs the list of all of the mirrors that need to be checked, then shells out to the mirror-specific version for each mirror in the list and backgrounds it. mradm02 can handle the load fine right now, even though that's doing all 102 of them in parallel, but that's not going to scale infinitely. For scalability it'll need some kind of thread management to make sure it only uses X number of threads or whatever.
Severity: normal → enhancement
OS: Mac OS X → All
Hardware: x86 → All
Target Milestone: --- → Future
Whiteboard: [tx]
Priority: -- → P3
Assignee: fwenzel → nobody
Assignee: nobody → fwenzel
Assignee: fwenzel → justdave
Did you want what I did already? It's the perl/sentry-multi.pl file in the production branch.
Thanks! I added that to the sentry directory: http://github.com/fwenzel/tuxedo/tree/master/sentry/ If you think this is sufficient for now, we can close this bug. I am thinking, once we run a staging site for the new bouncer, we might need to tweak this some more, but until then there's not much we can do, right?
note that file is dependent on the changes to sentry.pl to allow you to pass a single mirror ID to check, so make sure that's been ported as well. It's probably good for now. Eventually we should make it keep track of how many it's spawned and limit how many can be in progress at once. Doing an internal fork within Perl may be more resource-efficient than shelling out to a new copy of Perl for each one, too.
(In reply to comment #6) > note that file is dependent on the changes to sentry.pl to allow you to pass a > single mirror ID to check, so make sure that's been ported as well. I imported sentry straight from the production branch. I tested it with a prod checkout and it seems to work. > It's probably good for now. Eventually we should make it keep track of how > many it's spawned and limit how many can be in progress at once. Doing an > internal fork within Perl may be more resource-efficient than shelling out to a > new copy of Perl for each one, too. If you want, I can give you commit access to the github repo; or you can wait until git.m.o exists (bug 528360), and play around with Perl-internal threading then.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.