Closed
Bug 412507
Opened 17 years ago
Closed 6 years ago
tinderbox doesn't figure out what signal killed a process correctly
Categories
(Webtools Graveyard :: Tinderbox, defect)
Webtools Graveyard
Tinderbox
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: mark, Assigned: mark)
Details
Attachments
(2 files)
2.43 KB,
patch
|
nthomas
:
review+
coop
:
review+
|
Details | Diff | Splinter Review |
2.19 KB,
patch
|
coop
:
review+
|
Details | Diff | Splinter Review |
We had a couple of orange builds today, failing MozillaAliveTest on the Camino tinderbox. Here's one: http://tinderbox.mozilla.org/showlog.cgi?log=Camino/1200371880.1200372972.26899.gz Running MozillaAliveTest test ... Timeout = 15 seconds. Begin: Mon Jan 14 20:55:56 2008 cmd = /builds/tinderbox/CmTrunk/Darwin_9.1.0_Depend/mozilla/../build/dist/Camino.app/Contents/MacOS/Camino -P default End: Mon Jan 14 20:56:07 2008 ----------- Output from MozillaAliveTest ------------- stdout and stderr unbuffered ----------- End Output from MozillaAliveTest --------- Error: Camino: received SIGHUP Error: Camino: exited with status 16777215 Error: Camino: dumped core. MozillaAliveTest: test failed Why did the process get a signal AND exit with an exit status? Furthermore, who sent it a SIGHUP? Certainly not me or anyone else, and certainly not tinderbox, either. What gives? http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/tools/tinderbox/build-seamonkey-util.pl&rev=1.378&mark=1740-1742#1730 The calculation of $signal_num is totally bogus. It should be ($? & 127) to only look at the low-order 7 bits of $?. ($? >> 127) is a right-shift by a whopping 127 bits, and is very different. Whoops. Our process likely didn't die by SIGHUP, but ($? >> 127) = 1 for other reasons. Also, the waitpid checks seem to be wrong. We check for WIFEXITED but not WIFSIGNALED. If $wait_pid is -1, as it may be in our tinderbox' case (we don't know), we should at least log the error, to make it clear that whatever happens next might be totally bogus. Finally, we should set $? to 0 before we enter the waitpid loop, to avoid running off with some stale last-status value.
Assignee | ||
Comment 1•17 years ago
|
||
I'm going to test this on cb-miniosx01.
Attachment #297251 -
Flags: review?(nrthomas)
Comment 2•17 years ago
|
||
Comment on attachment 297251 [details] [diff] [review] Fix Seems ok to me, but seeking additional review from someone with more Perl under their belt.
Attachment #297251 -
Flags: review?(nrthomas)
Attachment #297251 -
Flags: review?(ccooper)
Attachment #297251 -
Flags: review+
Comment 3•17 years ago
|
||
Comment on attachment 297251 [details] [diff] [review] Fix >+ if ($number <= $#sig_name) { >+ return $sig_name[$number]; >+ } >+ else { >+ return $number; >+ } Do we hit instances where the sig number isn't in Config.pm? Should we be? > $exit_value = $? >> 8; >- $signal_num = $? >> 127; >- $dumped_core = $? & 128; >+ $signal_num = $? & 0x7f; >+ $dumped_core = $? & 0x80; Any particular reason to switch to hex here? Bit shifting is opaque enough.
Attachment #297251 -
Flags: review?(ccooper) → review+
Comment 4•17 years ago
|
||
FWIW, rhelmer pointed me at RunShellCommand in MozBuild that was originally designed to fix this broken piece of tinderbox: http://mxr.mozilla.org/seamonkey/source/tools/release/MozBuild/Util.pm#49
Assignee | ||
Comment 5•17 years ago
|
||
(In reply to comment #3) > Do we hit instances where the sig number isn't in Config.pm? Should we be? It's not a likely case, but if you're using a Perl built on an older OS release, it might not be aware of newly-added signals. > > $exit_value = $? >> 8; > >- $signal_num = $? >> 127; > >- $dumped_core = $? & 128; > >+ $signal_num = $? & 0x7f; > >+ $dumped_core = $? & 0x80; > > Any particular reason to switch to hex here? Bit shifting is opaque enough. At least in my mind, the bit patterns corresponding to 0x7f and 0x80 are much clearer than the patterns corresponding to 127 and 128. When used as masks, as they should be and as this patch does, the bit patterns are more important to understanding what the assignments actually do. The shifts here were never right.
Assignee | ||
Comment 6•17 years ago
|
||
Checked in on the trunk.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Comment 7•17 years ago
|
||
> FWIW, rhelmer pointed me at RunShellCommand in MozBuild that was originally > designed to fix this broken piece of tinderbox: ==> Bug 374568
Assignee | ||
Comment 8•17 years ago
|
||
We still weren't getting proper diagnostics when processes exited with nonzero exit statuses or were killed by signals. I tracked this down to two problems: . wait_for_pid was returning "done" instead of the %hash containing the exit information. . print_test_errors checked if the exit status was nonzero before printing information about the exit value and signal, but if the process was killed by a signal, the exit value as passed back by waitpid in $? would be zero. I've gotten ride of the "done" thing and have wait_for_pid now always returning the %hash, but if a process exited by timeout, the values in the return %hash are just left at 0. I've fixed print_test_errors to check the variable it's going to print before printing it, instead of gating everything based on the presence of a nonzero exit value. I'm currently testing this patch on the three Camino tinderboxen.
Attachment #299307 -
Flags: review?
Assignee | ||
Updated•17 years ago
|
Attachment #299307 -
Flags: review? → review?(ccooper)
Assignee | ||
Updated•17 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 9•17 years ago
|
||
Comment on attachment 299307 [details] [diff] [review] Additional fix Looks good.
Attachment #299307 -
Flags: review?(ccooper) → review+
Updated•10 years ago
|
Product: Webtools → Webtools Graveyard
Comment 10•6 years ago
|
||
Tinderbox isn't maintained anymore. Closing.
Status: REOPENED → RESOLVED
Closed: 17 years ago → 6 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•