Closed Bug 412507 Opened 17 years ago Closed 6 years ago

tinderbox doesn't figure out what signal killed a process correctly

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: mark, Assigned: mark)

Details

Attachments

(2 files)

Fix 17 years ago Mark Mentovai 2.43 KB, patch	nthomas : review+ coop : review+	Details \| Diff \| Splinter Review
Additional fix 17 years ago Mark Mentovai 2.19 KB, patch	coop : review+	Details \| Diff \| Splinter Review

Mark Mentovai

Assignee

Description

•

17 years ago

We had a couple of orange builds today, failing MozillaAliveTest on the Camino tinderbox.  Here's one:

http://tinderbox.mozilla.org/showlog.cgi?log=Camino/1200371880.1200372972.26899.gz

Running MozillaAliveTest test ...
Timeout = 15 seconds.
Begin: Mon Jan 14 20:55:56 2008
cmd = /builds/tinderbox/CmTrunk/Darwin_9.1.0_Depend/mozilla/../build/dist/Camino.app/Contents/MacOS/Camino -P default
End:   Mon Jan 14 20:56:07 2008
----------- Output from MozillaAliveTest ------------- 
  stdout and stderr unbuffered
----------- End Output from MozillaAliveTest --------- 
Error: Camino: received SIGHUP
Error: Camino: exited with status 16777215
Error: Camino: dumped core.
MozillaAliveTest: test failed

Why did the process get a signal AND exit with an exit status?  Furthermore, who sent it a SIGHUP?  Certainly not me or anyone else, and certainly not tinderbox, either.  What gives?

http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/tools/tinderbox/build-seamonkey-util.pl&rev=1.378&mark=1740-1742#1730

The calculation of $signal_num is totally bogus.  It should be ($? & 127) to only look at the low-order 7 bits of $?.  ($? >> 127) is a right-shift by a whopping 127 bits, and is very different.  Whoops.  Our process likely didn't die by SIGHUP, but ($? >> 127) = 1 for other reasons.

Also, the waitpid checks seem to be wrong.  We check for WIFEXITED but not WIFSIGNALED.

If $wait_pid is -1, as it may be in our tinderbox' case (we don't know), we should at least log the error, to make it clear that whatever happens next might be totally bogus.

Finally, we should set $? to 0 before we enter the waitpid loop, to avoid running off with some stale last-status value.

Mark Mentovai

Assignee

Comment 1

•

17 years ago

Attached patch Fix — Details — Splinter Review

I'm going to test this on cb-miniosx01.

Attachment #297251 - Flags: review?(nrthomas)

Nick Thomas [:nthomas] (UTC+12)

Comment 2

•

17 years ago

Comment on attachment 297251 [details] [diff] [review]
Fix

Seems ok to me, but seeking additional review from someone with more Perl under their belt.

Attachment #297251 - Flags: review?(nrthomas)

Attachment #297251 - Flags: review?(ccooper)

Attachment #297251 - Flags: review+

Chris Cooper [:coop] (he/him)

Comment 3

•

17 years ago

Comment on attachment 297251 [details] [diff] [review]
Fix

>+        if ($number <= $#sig_name) {
>+            return $sig_name[$number];
>+        }
>+        else {
>+            return $number;
>+        }

Do we hit instances where the sig number isn't in Config.pm? Should we be?

>         $exit_value = $? >> 8;
>-        $signal_num = $? >> 127;
>-        $dumped_core = $? & 128;
>+        $signal_num = $? & 0x7f;
>+        $dumped_core = $? & 0x80;

Any particular reason to switch to hex here? Bit shifting is opaque enough.

Attachment #297251 - Flags: review?(ccooper) → review+

Chris Cooper [:coop] (he/him)

Comment 4

•

17 years ago

FWIW, rhelmer pointed me at RunShellCommand in MozBuild that was originally designed to fix this broken piece of tinderbox:

http://mxr.mozilla.org/seamonkey/source/tools/release/MozBuild/Util.pm#49

Mark Mentovai

Assignee

Comment 5

•

17 years ago

(In reply to comment #3)
> Do we hit instances where the sig number isn't in Config.pm? Should we be?

It's not a likely case, but if you're using a Perl built on an older OS release, it might not be aware of newly-added signals.

> >         $exit_value = $? >> 8;
> >-        $signal_num = $? >> 127;
> >-        $dumped_core = $? & 128;
> >+        $signal_num = $? & 0x7f;
> >+        $dumped_core = $? & 0x80;
> 
> Any particular reason to switch to hex here? Bit shifting is opaque enough.

At least in my mind, the bit patterns corresponding to 0x7f and 0x80 are much clearer than the patterns corresponding to 127 and 128.  When used as masks, as they should be and as this patch does, the bit patterns are more important to understanding what the assignments actually do.  The shifts here were never right.

Mark Mentovai

Assignee

Comment 6

•

17 years ago

Checked in on the trunk.

Status: NEW → RESOLVED

Closed: 17 years ago

Resolution: --- → FIXED

Andrew Schultz

Comment 7

•

17 years ago

> FWIW, rhelmer pointed me at RunShellCommand in MozBuild that was originally
> designed to fix this broken piece of tinderbox:

==> Bug 374568

Mark Mentovai

Assignee

Comment 8

•

17 years ago

Attached patch Additional fix — Details — Splinter Review

We still weren't getting proper diagnostics when processes exited with nonzero exit statuses or were killed by signals.  I tracked this down to two problems:

. wait_for_pid was returning "done" instead of the %hash containing the exit
  information.
. print_test_errors checked if the exit status was nonzero before printing
  information about the exit value and signal, but if the process was killed
  by a signal, the exit value as passed back by waitpid in $? would be zero.

I've gotten ride of the "done" thing and have wait_for_pid now always returning the %hash, but if a process exited by timeout, the values in the return %hash are just left at 0.

I've fixed print_test_errors to check the variable it's going to print before printing it, instead of gating everything based on the presence of a nonzero exit value.

I'm currently testing this patch on the three Camino tinderboxen.

Attachment #299307 - Flags: review?

Mark Mentovai

Assignee

Updated

•

17 years ago

Attachment #299307 - Flags: review? → review?(ccooper)

Mark Mentovai

Assignee

Updated

•

17 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Chris Cooper [:coop] (he/him)

Comment 9

•

17 years ago

Comment on attachment 299307 [details] [diff] [review]
Additional fix

Looks good.

Attachment #299307 - Flags: review?(ccooper) → review+

Nobody; OK to take it and work on it

Updated

•

10 years ago

Product: Webtools → Webtools Graveyard

Sylvestre Ledru [:Sylvestre]

Comment 10

•

6 years ago

Tinderbox isn't maintained anymore. Closing.

Status: REOPENED → RESOLVED

Closed: 17 years ago → 6 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

tinderbox doesn't figure out what signal killed a process correctly

Categories

(Webtools Graveyard :: Tinderbox, defect)

Tracking

(Not tracked)

People

(Reporter: mark, Assigned: mark)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Updated

Comment 9

Updated

Comment 10

Attachment

General

Description

File Name

Content Type