Closed Bug 539691 Opened 15 years ago Closed 14 years ago

automation.py should not try to invoke breakpad when a debugger is specified

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: bc, Assigned: jruderman)

References

Details

Attachments

(2 files, 1 obsolete file)

patch to automation.py.in 14 years ago Jesse Ruderman 3.22 KB, patch	ted : review-	Details \| Diff \| Splinter Review
ps aux after failed timeout on linux 14 years ago Bob Clary [:bc] (inactive) 1.58 KB, text/plain		Details
patch v2 14 years ago Jesse Ruderman 5.13 KB, patch	ted : review+	Details \| Diff \| Splinter Review

Bob Clary [:bc] (inactive)

Reporter

Description

•

15 years ago

When running tests and a test times out, automation.py.in will try to invoke breakpad by sending a SIGSEGV to the process. However when a debugger is specified, the signal goes to the debugger instead. In the case of valgrind 3.5.0, it will result in valgrind working on a defunct process and not terminate valgrind or the test.

make -C firefox-debug EXTRA_TEST_ARGS='--debugger=valgrind --debugger-args="--leak-check=full --trace-children=yes" --timeout=600' TEST_PATH=js/src/tests/js1_5/Function/jstests.list jstestbrowser

Bob Clary [:bc] (inactive)

Reporter

Updated

•

15 years ago

Version: 1.9.2 Branch → Trunk

Bob Clary [:bc] (inactive)

Reporter

Updated

•

14 years ago

Blocks: 521331

Jesse Ruderman

Assignee

Comment 1

•

14 years ago

> When running tests and a test times out, automation.py.in will try to 
> invoke breakpad by sending a SIGSEGV to the process.

Yeah, the condition in triggerBreakpad is wrong.  Instead of just "self.CRASHREPORTER", it should also check for "not debuggerInfo", similar to the call to environment().

> However when a debugger is
> specified, the signal goes to the debugger instead. In the case of valgrind
> 3.5.0, it will result in valgrind working on a defunct process and not
> terminate valgrind or the test.

Are you sure this is what's happening?  I thought Valgrind did all its emulation in-process.  The os.kill should go to Valgrind itself, and it shouldn't matter whether it's a SEGV or a KILL.

Jesse Ruderman

Assignee

Comment 2

•

14 years ago

Attached patch patch to automation.py.in (obsolete) — Details — Splinter Review

Assignee: nobody → jruderman

Status: NEW → ASSIGNED

Attachment #422344 - Flags: review?(ted.mielczarek)

Bob Clary [:bc] (inactive)

Reporter

Comment 3

•

14 years ago

Attached file ps aux after failed timeout on linux — Details

(In reply to comment #1)

> Are you sure this is what's happening?

I think so.

I tried your patch on Mac with a relatively short timeout of 60 seconds and it killed the test run before it could really get started. I've started a different run with a longer time out of 300 seconds but haven't hit the timeout condition on Mac yet. 

I also tried your patch on Linux with a timeout of 300 seconds and still get the failure situation. I'll attach the ps aux output for the relevant processes.

I've been thinking about this whole situation and I think bug 501034 is wrong in its use of the --timeout value. The way I understand it, --timeout is used by the test frameworks to decide when to timeout a particular test and fail it before moving on to the next test. The way bug 501034's fix works, it instead attempts to kill the entire test run when any single test times out. I think it should instead use a different timeout value to decide if the test run has actually hung instead of a single test timing out.

Bob Clary [:bc] (inactive)

Reporter

Comment 4

•

14 years ago

I might have been wrong about Linux in the previous comment. The test run recovered and is testing the remainder of the tests. I won't have access to the log until it completes though. I'll update the bug later tonight when it does complete.

(not currently active) Ted Mielczarek

Comment 5

•

14 years ago

(In reply to comment #3)
> I've been thinking about this whole situation and I think bug 501034 is wrong
> in its use of the --timeout value. The way I understand it, --timeout is used
> by the test frameworks to decide when to timeout a particular test and fail it
> before moving on to the next test. The way bug 501034's fix works, it instead
> attempts to kill the entire test run when any single test times out. I think it
> should instead use a different timeout value to decide if the test run has
> actually hung instead of a single test timing out.

I tried to make this smart, and make the "hang" timeout equal to the harnesses' internal timeouts + 30 seconds, to give each harness 30 seconds to deal with tests that timeout, and only then kill them.

See:
http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py.in#462
http://mxr.mozilla.org/mozilla-central/source/layout/tools/reftest/runreftest.py#138

Jesse Ruderman

Assignee

Comment 6

•

14 years ago

30s might not be enough when Valgrind is involved.

(not currently active) Ted Mielczarek

Comment 7

•

14 years ago

Well, your patch here will make that moot anyway. :)

Bob Clary [:bc] (inactive)

Reporter

Comment 8

•

14 years ago

Jesse: The patch appears to work fine for me by not attempting to kill the process when a debugger is specified. Thanks.

Ted: Yeah, this makes the issue about killing a timed out process mute for cases where a debugger is involved. but I think instead of adding a fixed 30 seconds to the user specified timeout, scaling the user timeout would be a little bit better. For example, if you used twice the user specified timeout, the additional wait would not be that long relative to a normal test and would, I think, reduce the cases where the entire run was killed due to a particularly slow test+recovery.

(not currently active) Ted Mielczarek

Comment 9

•

14 years ago

It could, sure, but if the test harness isn't responding within 30 seconds, I don't have a lot of faith in it doing anything useful.

Jesse Ruderman

Assignee

Comment 10

•

14 years ago

> Jesse: The patch appears to work fine for me by not attempting to kill the
> process when a debugger is specified. Thanks.

Eh?  It should still kill the process, just using SIGKILL rather than SIGSEGV.

Bob Clary [:bc] (inactive)

Reporter

Comment 11

•

14 years ago

crap, you're right of course. I got confused by the timeout used in automation.py.in for detecting timeouts in reading lines from the output rather than the test timing out. I'll try the mochitest-plain tests which is where I originally saw the issue, but it will take quite a while to run. You might just want to go with Ted's r.

(not currently active) Ted Mielczarek

Comment 12

•

14 years ago

Comment on attachment 422344 [details] [diff] [review]
patch to automation.py.in

I don't think this is what we want. I think we should just ignore the "timeout" value (set it to None) when a debugger is attached. Presumably we don't want to send SIGKILL to the user's debugger if they're sitting there in gdb.

Attachment #422344 - Flags: review?(ted.mielczarek) → review-

Jesse Ruderman

Assignee

Comment 13

•

14 years ago

For non-interactive debuggers, you really want the timeouts to continue working.  Especially Valgrind.

I understand your concern about interactive debuggers.  Perhaps we should add:

if debuggerInfo and debuggerInfo["interactive"]:
  timeout = None
  maxTime = None

We could even change automationutils.py so unrecognized debuggers are assumed to be interactive.  But those changes could just as easily be made in a separate bug and patch.

Jesse Ruderman

Assignee

Updated

•

14 years ago

Attachment #422344 - Flags: review- → review?(ted.mielczarek)

(not currently active) Ted Mielczarek

Comment 14

•

14 years ago

Comment on attachment 422344 [details] [diff] [review]
patch to automation.py.in

This still doesn't fix the problem, in that you're still going to call proc.kill() if the process times out, and kill the user's debugger. I really think the right fix here is to set the timeout to None if a debugger is attached. If you want to special-case interactive debuggers, that's fine, but we should just not be running the timeout handling code *at all* in this case.

Attachment #422344 - Flags: review?(ted.mielczarek) → review-

Jesse Ruderman

Assignee

Comment 15

•

14 years ago

Attached patch patch v2 — Details — Splinter Review

Don't time out when using an interactive debugger. Kill non-interactive debuggers in a cleaner way.

Attachment #422344 - Attachment is obsolete: true

Attachment #428861 - Flags: review?(ted.mielczarek)

(not currently active) Ted Mielczarek

Comment 16

•

14 years ago

Comment on attachment 428861 [details] [diff] [review]
patch v2

Ok, this makes sense now. We should probably make more things into member variables at some point (like debuggerInfo) so we don't have to pass them around via class methods.

Attachment #428861 - Flags: review?(ted.mielczarek) → review+

Jesse Ruderman

Assignee

Comment 17

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/659d517acd61

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.