Closed
Bug 372582
Opened 18 years ago
Closed 18 years ago
RunShellCommand does not reap timed-out children
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ajschult784, Assigned: ajschult784)
References
Details
Attachments
(2 files, 1 obsolete file)
2.01 KB,
patch
|
Details | Diff | Splinter Review | |
5.36 KB,
patch
|
rhelmer
:
review+
|
Details | Diff | Splinter Review |
RunShellCommand's timeout alarm interrupts the while loop that waits for the child process to exit and so any timed-out process is never reaped. Ideally, the while loop would ignore the alarm, but I don't know how to pull that off (or if it's possible), so I added some bits after the process is killed to reap it.
I also switched WCOREDUMP to ($? & 128) since both of my systems don't seem to have this and it's not officially part of perl's POSIX http://perldoc.perl.org/POSIX.html
Attachment #257238 -
Flags: review?(rhelmer)
Comment 1•18 years ago
|
||
This takes care of:
-- reap timed out children; there's no looping/sleeping logic because I'm not convinced such logic should exist; on systems that are well-behaved (mac/linux; win32 with msys?) kill() with -9 does the right thing, and waitpid() should as well. (Arguably, this patch should waitpid() withOUT WNOHANG, but then we'd have to put in a timeout for that, which... gets scary.)
-- if output is true, and STDOUT is still in buffered mode, printOutputImmediately is somewhat misleading, since output won't be printed immediately. I ran into this bug with the new dumpLogs config setting that was added as part of bug 371350. There are some steps that talk a long time (rm -rf mozilla), and you'd get partial printouts while that command took its sweet time (especially on Crapdows). This fixes that, plus restores the buffering setting to its previous state when we're done.
Attachment #258367 -
Flags: review?(rhelmer)
Assignee | ||
Comment 2•18 years ago
|
||
> -- reap timed out children; there's no looping/sleeping logic because
> I'm not convinced such logic should exist;
Actually, in testing, it would take ~2 seconds to reap a simple process like "sleep 300". More specifically, a process temporarily in uninterruptible sleep (perhaps waiting on simple disk I/O) will never be reaped with your patch.
Comment 3•18 years ago
|
||
(In reply to comment #2)
> > -- reap timed out children; there's no looping/sleeping logic because
> > I'm not convinced such logic should exist;
>
> Actually, in testing, it would take ~2 seconds to reap a simple process like
> "sleep 300".
Platform? Was the system loaded?
If this is the case, I suppose we could layer on the timeouts (call waitpid() without WNOHANG, and set another alarm()), but I'm still not convinced about the "retry waitpid() and sleep() some to make sure it works"-logic.
> More specifically, a process temporarily in uninterruptible sleep
> (perhaps waiting on simple disk I/O) will never be reaped with your patch.
I'm not worried about handling processes that are dead in I/O. waitpid() with a timeout should suffice.
This patch at least does the right thing with respect to non-broken and -pathological systems, AND it includes the STDOUT patch, which is being actively annoying, as opposed to this bug, which is almost-never annoying, so I'll let rhelmer review it, and do a follow-up to do the waitpid() timeout thinger.
Assignee | ||
Comment 4•18 years ago
|
||
> Platform? Was the system loaded?
Linux, FC6. System was mostly idle, I think.
There's no guarantee that the signal is handled before kill exits (on any platform I'd expect, certainly *nix, including Macs), which would seem to be a requirement for what you're saying to be true.
And further testing (w/idle system) shows "2 seconds" is wrong. It did take two passes through the loop, but that's 1 second. If I remove the sleep entirely (so just looping as fast as possible), the process gets reaped on the 3rd pass.
Comment 5•18 years ago
|
||
Same as before, except give waitpid() a little breathing room.
Attachment #258367 -
Attachment is obsolete: true
Attachment #258478 -
Flags: review?(rhelmer)
Attachment #258367 -
Flags: review?(rhelmer)
Assignee | ||
Updated•18 years ago
|
Attachment #257238 -
Flags: review?(rhelmer)
Comment 6•18 years ago
|
||
Comment on attachment 258478 [details] [diff] [review]
Reap timed out children + other fixes, Take II
>+ # Processes get 10 seconds to obey.
>+ eval {
>+ local $SIG{'ALRM'} = sub { die("alarm\n") };
>+ alarm($EXEC_REAP_TIMEOUT);
>+ my $waitRv = waitpid($childPid, 0);
>+ alarm(0);
>+ # Don't fill in these values if they're bogus.
>+ if ($waitRv > 0) {
>+ $exitValue = WEXITSTATUS($?);
>+ $signalNum = WIFSIGNALED($?) && WTERMSIG($?);
>+ $dumpedCore = WIFSIGNALED($?) && ($? & 128);
>+ }
>+ };
Hmm, comment says "Processes get 10 seconds to obey", shouldn't that be alarm(10) or am I missing something?
Comment 7•18 years ago
|
||
(In reply to comment #6)
> (From update of attachment 258478 [details] [diff] [review])
> >+ # Processes get 10 seconds to obey.
> >+ eval {
> >+ local $SIG{'ALRM'} = sub { die("alarm\n") };
> >+ alarm($EXEC_REAP_TIMEOUT);
> >+ my $waitRv = waitpid($childPid, 0);
> >+ alarm(0);
> >+ # Don't fill in these values if they're bogus.
> >+ if ($waitRv > 0) {
> >+ $exitValue = WEXITSTATUS($?);
> >+ $signalNum = WIFSIGNALED($?) && WTERMSIG($?);
> >+ $dumpedCore = WIFSIGNALED($?) && ($? & 128);
> >+ }
> >+ };
>
> Hmm, comment says "Processes get 10 seconds to obey", shouldn't that be
> alarm(10) or am I missing something?
First diff chunks says:
+my $EXEC_REAP_TIMEOUT = 10;
I could've said "Processes get $EXEC_REAP_TIMEOUT seconds to obey," but then you'd have to go find the definition that variable... which is annoying.
Then again, remembering to change the comment if you ever change the timeout is annoying too.
Updated•18 years ago
|
Attachment #258478 -
Flags: review?(rhelmer) → review+
Updated•18 years ago
|
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•