Closed
Bug 539691
Opened 15 years ago
Closed 14 years ago
automation.py should not try to invoke breakpad when a debugger is specified
Categories
(Testing :: Mochitest, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bc, Assigned: jruderman)
References
Details
Attachments
(2 files, 1 obsolete file)
1.58 KB,
text/plain
|
Details | |
5.13 KB,
patch
|
ted
:
review+
|
Details | Diff | Splinter Review |
When running tests and a test times out, automation.py.in will try to invoke breakpad by sending a SIGSEGV to the process. However when a debugger is specified, the signal goes to the debugger instead. In the case of valgrind 3.5.0, it will result in valgrind working on a defunct process and not terminate valgrind or the test. make -C firefox-debug EXTRA_TEST_ARGS='--debugger=valgrind --debugger-args="--leak-check=full --trace-children=yes" --timeout=600' TEST_PATH=js/src/tests/js1_5/Function/jstests.list jstestbrowser
Reporter | ||
Updated•15 years ago
|
Version: 1.9.2 Branch → Trunk
Assignee | ||
Comment 1•14 years ago
|
||
> When running tests and a test times out, automation.py.in will try to > invoke breakpad by sending a SIGSEGV to the process. Yeah, the condition in triggerBreakpad is wrong. Instead of just "self.CRASHREPORTER", it should also check for "not debuggerInfo", similar to the call to environment(). > However when a debugger is > specified, the signal goes to the debugger instead. In the case of valgrind > 3.5.0, it will result in valgrind working on a defunct process and not > terminate valgrind or the test. Are you sure this is what's happening? I thought Valgrind did all its emulation in-process. The os.kill should go to Valgrind itself, and it shouldn't matter whether it's a SEGV or a KILL.
Assignee | ||
Comment 2•14 years ago
|
||
Assignee: nobody → jruderman
Status: NEW → ASSIGNED
Attachment #422344 -
Flags: review?(ted.mielczarek)
Reporter | ||
Comment 3•14 years ago
|
||
(In reply to comment #1) > Are you sure this is what's happening? I think so. I tried your patch on Mac with a relatively short timeout of 60 seconds and it killed the test run before it could really get started. I've started a different run with a longer time out of 300 seconds but haven't hit the timeout condition on Mac yet. I also tried your patch on Linux with a timeout of 300 seconds and still get the failure situation. I'll attach the ps aux output for the relevant processes. I've been thinking about this whole situation and I think bug 501034 is wrong in its use of the --timeout value. The way I understand it, --timeout is used by the test frameworks to decide when to timeout a particular test and fail it before moving on to the next test. The way bug 501034's fix works, it instead attempts to kill the entire test run when any single test times out. I think it should instead use a different timeout value to decide if the test run has actually hung instead of a single test timing out.
Reporter | ||
Comment 4•14 years ago
|
||
I might have been wrong about Linux in the previous comment. The test run recovered and is testing the remainder of the tests. I won't have access to the log until it completes though. I'll update the bug later tonight when it does complete.
Comment 5•14 years ago
|
||
(In reply to comment #3) > I've been thinking about this whole situation and I think bug 501034 is wrong > in its use of the --timeout value. The way I understand it, --timeout is used > by the test frameworks to decide when to timeout a particular test and fail it > before moving on to the next test. The way bug 501034's fix works, it instead > attempts to kill the entire test run when any single test times out. I think it > should instead use a different timeout value to decide if the test run has > actually hung instead of a single test timing out. I tried to make this smart, and make the "hang" timeout equal to the harnesses' internal timeouts + 30 seconds, to give each harness 30 seconds to deal with tests that timeout, and only then kill them. See: http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py.in#462 http://mxr.mozilla.org/mozilla-central/source/layout/tools/reftest/runreftest.py#138
Assignee | ||
Comment 6•14 years ago
|
||
30s might not be enough when Valgrind is involved.
Comment 7•14 years ago
|
||
Well, your patch here will make that moot anyway. :)
Reporter | ||
Comment 8•14 years ago
|
||
Jesse: The patch appears to work fine for me by not attempting to kill the process when a debugger is specified. Thanks. Ted: Yeah, this makes the issue about killing a timed out process mute for cases where a debugger is involved. but I think instead of adding a fixed 30 seconds to the user specified timeout, scaling the user timeout would be a little bit better. For example, if you used twice the user specified timeout, the additional wait would not be that long relative to a normal test and would, I think, reduce the cases where the entire run was killed due to a particularly slow test+recovery.
Comment 9•14 years ago
|
||
It could, sure, but if the test harness isn't responding within 30 seconds, I don't have a lot of faith in it doing anything useful.
Assignee | ||
Comment 10•14 years ago
|
||
> Jesse: The patch appears to work fine for me by not attempting to kill the
> process when a debugger is specified. Thanks.
Eh? It should still kill the process, just using SIGKILL rather than SIGSEGV.
Reporter | ||
Comment 11•14 years ago
|
||
crap, you're right of course. I got confused by the timeout used in automation.py.in for detecting timeouts in reading lines from the output rather than the test timing out. I'll try the mochitest-plain tests which is where I originally saw the issue, but it will take quite a while to run. You might just want to go with Ted's r.
Comment 12•14 years ago
|
||
Comment on attachment 422344 [details] [diff] [review] patch to automation.py.in I don't think this is what we want. I think we should just ignore the "timeout" value (set it to None) when a debugger is attached. Presumably we don't want to send SIGKILL to the user's debugger if they're sitting there in gdb.
Attachment #422344 -
Flags: review?(ted.mielczarek) → review-
Assignee | ||
Comment 13•14 years ago
|
||
For non-interactive debuggers, you really want the timeouts to continue working. Especially Valgrind. I understand your concern about interactive debuggers. Perhaps we should add: if debuggerInfo and debuggerInfo["interactive"]: timeout = None maxTime = None We could even change automationutils.py so unrecognized debuggers are assumed to be interactive. But those changes could just as easily be made in a separate bug and patch.
Assignee | ||
Updated•14 years ago
|
Attachment #422344 -
Flags: review- → review?(ted.mielczarek)
Comment 14•14 years ago
|
||
Comment on attachment 422344 [details] [diff] [review] patch to automation.py.in This still doesn't fix the problem, in that you're still going to call proc.kill() if the process times out, and kill the user's debugger. I really think the right fix here is to set the timeout to None if a debugger is attached. If you want to special-case interactive debuggers, that's fine, but we should just not be running the timeout handling code *at all* in this case.
Attachment #422344 -
Flags: review?(ted.mielczarek) → review-
Assignee | ||
Comment 15•14 years ago
|
||
Don't time out when using an interactive debugger. Kill non-interactive debuggers in a cleaner way.
Attachment #422344 -
Attachment is obsolete: true
Attachment #428861 -
Flags: review?(ted.mielczarek)
Comment 16•14 years ago
|
||
Comment on attachment 428861 [details] [diff] [review] patch v2 Ok, this makes sense now. We should probably make more things into member variables at some point (like debuggerInfo) so we don't have to pass them around via class methods.
Attachment #428861 -
Flags: review?(ted.mielczarek) → review+
Assignee | ||
Comment 17•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/659d517acd61
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•