Closed
Bug 52573
Opened 24 years ago
Closed 16 years ago
Bonsai doesn't scale very well
Categories
(Webtools Graveyard :: Bonsai, defect, P1)
Webtools Graveyard
Bonsai
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rkotalampi, Assigned: bear)
References
Details
(Keywords: crash)
Attachments
(5 files, 1 obsolete file)
902 bytes,
patch
|
cls
:
review+
|
Details | Diff | Splinter Review |
2.47 KB,
text/plain
|
cls
:
review+
|
Details |
2.29 KB,
patch
|
cls
:
review+
|
Details | Diff | Splinter Review |
3.42 KB,
patch
|
cls
:
review+
|
Details | Diff | Splinter Review |
1.03 KB,
patch
|
cls
:
review+
reed
:
review+
|
Details | Diff | Splinter Review |
When the tree is spanked badly and there's a lot of modification it drains huge amount of resources. One mail per file, each of those forking a long chain on perl programs and eating a lot of memory. Currently we have some limitations in sendmail but they don't seem to help because in many cases the problem is not the load but the memory all these big processes takes. bonsai needs to be rearchitected either on cvs side or in database side to better scale to situations like this.
Comment 1•24 years ago
|
||
we have the same problem on warp, although typically the changes are not quite as large.
<rayw> Define "brought the whole system to hork". <risto> rayw, bugzilla/tinderbox/bonsai crashed.... mysqld crashed... sshd crashed... inetd crashed
Keywords: crash
Reporter | ||
Comment 3•24 years ago
|
||
Let me just add here that we already have limitations in sendmail that were supposed to prevent this happening: O MaxDaemonChildren=25 O QueueLA=8 But like I wrote my guess is that it's not a load issue... it's a memory issue these 25 children plus the process chain will eat. Let me see if I can find some sar data about today's incident.
Reporter | ||
Comment 4•24 years ago
|
||
CPU usage: SunOS lounge.mozilla.org 5.6 Generic_105181-16 sun4u 09/13/00 16:50:00 %usr %sys %wio %idle 16:51:00 53 14 6 28 16:52:00 81 17 1 1 16:53:01 79 18 3 1 16:54:00 75 21 3 0 16:55:01 64 23 12 0 16:56:50 50 20 29 2 16:57:00 52 18 30 0 16:58:01 44 16 39 0 Average 62 19 15 4 CPU load looks ok to me... --- Memory (these are disk blocks, 512b each): SunOS lounge.mozilla.org 5.6 Generic_105181-16 sun4u 09/13/00 16:50:00 freemem freeswap 16:51:00 43401 5025628 16:52:00 11643 4359608 16:53:01 4103 3431154 16:54:00 4089 2551139 16:55:01 3203 1760839 16:56:50 2572 875564 16:57:00 2428 260161 16:58:01 3612 204572 Average 9402 2374776 OUCH! In 7 minutes someone allocated about 2G. --- Unfortunately at 16:58 crond died and we didn't get more data but I'm sure the idea can be seen in these. Someone used all the memory and swap in the server.
Comment 5•24 years ago
|
||
it was probably all the 1000+ sendmail and handlecheckin.pl processes.
Reporter | ||
Comment 6•24 years ago
|
||
Not only that because sendmail in theory should limit max children to 25. But handlecheckin.pl launces another program, which launces more, and more, and more....
Comment 7•24 years ago
|
||
someone outside cpd landed a massive checkin last night and warp was broken from 5:30-8:30pm due to lack of resources (memory & swap). Anyone have any thoughts on how to hack bonsai to throttle handlemail and/or addcheckin?
Comment 8•23 years ago
|
||
this is being worked on now along with other enhancements that give more control over the a CVS repository
Status: NEW → ASSIGNED
Updated•22 years ago
|
Priority: P3 → P1
Comment 9•22 years ago
|
||
There's a dupe of this bug somewhere... ("addCheckin uses all available memory" or something to that effect) I remember seeing it earlier, but I can't go back to look for it without losing my place in the list. If I remember when I'm done with this buglist I'll go back and look.
Comment 10•22 years ago
|
||
*** Bug 39328 has been marked as a duplicate of this bug. ***
Comment 11•22 years ago
|
||
found it :)
Comment 12•22 years ago
|
||
By the definitions on <http://bugzilla.mozilla.org/bug_status.html#severity> and <http://bugzilla.mozilla.org/enter_bug.cgi?format=guided>, crashing and dataloss bugs are of critical or possibly higher severity. Only changing open bugs to minimize unnecessary spam. Keywords to trigger this would be crash, topcrash, topcrash+, zt4newcrash, dataloss.
Severity: major → critical
Comment 13•18 years ago
|
||
Somehow I suspect the "this is being worked on" from 2001 to have never materialized. Removing "assigned" status. This is something we need to fix in short order on bonsai.m.o... we've been getting bit by it again recently. We just made a change to tinderbox a few weeks back that could (and probably should) be borrowed for this. See bug 354462 and its dependencies. The script run by sendmail on the incoming mail just dumps the mail in a directory. A cron job then comes through and processes the stuff in the directory serially so you don't wind up with 50 processes running at once if a bunch of directories got touched on the same checkin.
Status: ASSIGNED → NEW
Updated•18 years ago
|
OS: Solaris → All
Updated•18 years ago
|
QA Contact: timeless → bonsai
Assignee | ||
Updated•17 years ago
|
Assignee: tara → bear
Assignee | ||
Comment 14•17 years ago
|
||
this updates handleCheckinMail.pl to write to the data directory the mail files
Attachment #274046 -
Flags: review?(cls)
Assignee | ||
Comment 15•17 years ago
|
||
Attachment #274047 -
Flags: review?(cls)
Attachment #274046 -
Flags: review?(cls) → review+
Comment 16•17 years ago
|
||
Comment on attachment 274047 [details]
new file to process all mail files
Remove the & before function calls && r=cls
Attachment #274047 -
Flags: review?(cls) → review+
Assignee | ||
Comment 17•17 years ago
|
||
Checking in handleCheckinMail.pl; /cvsroot/mozilla/webtools/bonsai/handleCheckinMail.pl,v <-- handleCheckinMail.pl new revision: 1.8; previous revision: 1.7 done RCS file: /cvsroot/mozilla/webtools/bonsai/processMail.pl,v done Checking in processMail.pl; /cvsroot/mozilla/webtools/bonsai/processMail.pl,v <-- processMail.pl initial revision: 1.1 done
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Comment 18•17 years ago
|
||
Are there instructions somewhere how to deploy this? I assume it needs a cron job set up and so forth.
Assignee | ||
Comment 19•17 years ago
|
||
doh! I am commiting the following to the INSTALL file (just below where it says how to setup the handleCheckMail.pl) To process the queued mail from handleCheckinMail.pl, you will need to setup a cron job to call processMail.pl. processMail.pl does take an optional parameter to locate the bonsai data directory, but if it's not present it will default to the directory where processMail.pl resides. As the bonsai user, add a cron job to call 'processMail.pl'. For example, MAILTO="root" USER=bonsai */5 * * * * /usr/local/bonsai/processMail.pl This will cause the bonsai mail to be processed every five minutes and to mail the root user if any errors occur.
Comment 20•17 years ago
|
||
Oh, so this patch actually makes bonsai run setuid now also? (There is no bonsai user on the production server currently) That in itself requires additional config changes, does it not?
Assignee | ||
Comment 21•17 years ago
|
||
yes, if the production server isn't running with a bonsai user then setuid will need to be used. Let me know if there are any tweaks you need to make production life easier - they are probably changes that bonsai should have in the first place.
Comment 22•17 years ago
|
||
So... the Makefile apparently doesn't set the permissions up how they need to be for the setuid stuff to actually work... everything's still owned by apache or root.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 23•17 years ago
|
||
Attachment #289670 -
Flags: review?
Updated•17 years ago
|
Attachment #289670 -
Flags: review? → review?(cls)
Comment 24•17 years ago
|
||
Comment on attachment 289670 [details] [diff] [review] Patch to fix Makefile >+ @echo "Fixing permissions" >+ @chown -R ${BONSAI_USER} $(PREFIX) >+ @chgrp -R ${BONSAI_GROUP} $(PREFIX) Why not |@chown -R {BONSAI_USER}:${BONSAI_GROUP} $(PREFIX)| instead of doubling the time it takes by doing them separately?
Comment 25•17 years ago
|
||
(In reply to comment #24) > Why not |@chown -R {BONSAI_USER}:${BONSAI_GROUP} $(PREFIX)| instead of doubling > the time it takes by doing them separately? Because the combined version doesn't work on Solaris.
Attachment #289670 -
Flags: review?(cls) → review+
Comment 26•17 years ago
|
||
Checking in Makefile.in; /cvsroot/mozilla/webtools/bonsai/Makefile.in,v <-- Makefile.in new revision: 1.19; previous revision: 1.18 done
Status: REOPENED → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → FIXED
Comment 27•17 years ago
|
||
The mail handling code here doesn't work. There's a taint error in handleCheckinMail.pl (Insecure dependency in chdir while running setuid at /etc/smrsh/handlewwwCheckinMail.pl line 28.) Even after fixing that (detaint $ARGV[0] before using it), I couldn't get any test commits to show up. I turned on debug mode, and it finds the queue file and deletes it, but the data from the file never makes it to the database. I ran out of time to debug this on the live system, and couldn't locate anyone familiar with the code, so I reverted back to the previous version of bonsai in production.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•17 years ago
|
Summary: bonsai don't scale very well → Bonsai doesn't scale very well
Comment 28•17 years ago
|
||
oh, the other thing I ran into... processMail.pl is not installed by the Makefile.
Comment 29•16 years ago
|
||
Sendmail is pretty bad, AFAIK. There are better alternatives: http://shearer.org/MTA_Comparison http://www.geocities.com/mailsoftware42/ I'd try Postfix. It's fast and maintained. For the scripting, maybe Parallel or Stackless Python can help.
Comment 30•16 years ago
|
||
(In reply to comment #29) > Sendmail is pretty bad, AFAIK. There are better alternatives: [...] > I'd try Postfix. It's fast and maintained. I love postfix to death, and we use it on our primary mail relays (where it matters), however, I suspect the choice of MTA in use here has very little relation if any to the problem at hand (which is in the bonsai application's handling of mail, not how it gets passed off to it by the MTA).
Comment 31•16 years ago
|
||
So, there were a number of problems: * processMail.pl needed to be chmod'd +x * if (($#ARGV >= 0) && (-d $ARGV[0])) is wrong, as it's always true * $ARGV[0] needed to be detainted * $0 doesn't work anymore for the current script's full path, as it's setuid now This patch is currently being used on http://bonsai-stage.mozilla.org/, and it seems to be working. Yay! While my fixes seem to work, there may be better ways to do what I'm doing. Please let me know if there's something better I could do.
Attachment #320146 -
Flags: review?(cls)
Attachment #320146 -
Flags: review?(bear)
Attachment #320146 -
Flags: review?(cls) → review+
Updated•16 years ago
|
Attachment #320146 -
Flags: review?(bear)
Comment 32•16 years ago
|
||
Checking in Makefile.in; /cvsroot/mozilla/webtools/bonsai/Makefile.in,v <-- Makefile.in new revision: 1.20; previous revision: 1.19 done Checking in handleAdminMail.pl; /cvsroot/mozilla/webtools/bonsai/handleAdminMail.pl,v <-- handleAdminMail.pl new revision: 1.7; previous revision: 1.6 done Checking in handleCheckinMail.pl; /cvsroot/mozilla/webtools/bonsai/handleCheckinMail.pl,v <-- handleCheckinMail.pl new revision: 1.9; previous revision: 1.8 done
Status: REOPENED → RESOLVED
Closed: 17 years ago → 16 years ago
Resolution: --- → FIXED
Comment 33•16 years ago
|
||
While I was upgrading all the Mozilla bonsai instances, I noticed that processMail.pl had some of the same issues that handle*Mail.pl had, along with a few other things. I took this opportunity to clean it up and use the commit as a test for the scripts. This has already been committed, but getting post-commit review.
Attachment #321223 -
Flags: review?(cls)
Comment 34•16 years ago
|
||
So, turns out a bug in the original processMail.pl caused bonsai-www and bonsai-l10n not to get updated properly (bug 434489). The issue was that there wasn't a chdir happening, which caused |require| and the |system()| calls not to work correctly, as the call wasn't happening in the right directory. I didn't notice it when testing, as I was mostly testing bonsai.m.o, which worked fine due to bonsai's homedir on dm-webtools02 being /var/www/webtools/bonsai-cvs. I tested bonsai-www and bonsai-l10n, too, but I was manually running processMail.pl under the proper directory, so it worked fine. Oh well. I've patched processMail.pl on dm-webtools02 with ajschult's patch, and everything seems to be working now. I committed ajschult's modification with my r+, but I still want to have my change and ajschult's change reviewed by bear or cls. Checking in processMail.pl; /cvsroot/mozilla/webtools/bonsai/processMail.pl,v <-- processMail.pl new revision: 1.4; previous revision: 1.3 done
Attachment #321223 -
Attachment is obsolete: true
Attachment #321612 -
Flags: review?(cls)
Attachment #321612 -
Flags: review?(bear)
Attachment #321223 -
Flags: review?(cls)
Updated•16 years ago
|
Attachment #321612 -
Flags: review+
Comment 35•16 years ago
|
||
Comment on attachment 321612 [details] [diff] [review] Clean-up processMail.pl to use @BONSAI_DIR@ and other fixes from ajschult - v2 r=cls if you add an |or die "";| to the chdir so that it's obvious why the script failed.
Attachment #321612 -
Flags: review?(cls) → review+
Comment 36•16 years ago
|
||
(In reply to comment #35) > (From update of attachment 321612 [details] [diff] [review]) > r=cls if you add an |or die "";| to the chdir so that it's obvious why the > script failed. Checking in processMail.pl; /cvsroot/mozilla/webtools/bonsai/processMail.pl,v <-- processMail.pl new revision: 1.5; previous revision: 1.4 done
Updated•16 years ago
|
Attachment #321612 -
Flags: review?(bear)
Updated•8 years ago
|
Product: Webtools → Webtools Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•