Closed
Bug 1270144
Opened 7 years ago
Closed 7 years ago
handleTimeout() and friends should indicate they will kill the process
Categories
(Testing :: Mochitest, defect)
Testing
Mochitest
Tracking
(firefox49 fixed)
RESOLVED
FIXED
mozilla49
Tracking | Status | |
---|---|---|
firefox49 | --- | fixed |
People
(Reporter: Gijs, Assigned: Gijs)
Details
Attachments
(1 file)
As in summary. Right now, reading the log it's not obvious whether the crash stacks that follow are from a crash that happened causing the timeout, or just the result of us killing the process because of the timeout. The string "application timed out after %d seconds" has quite some hits in DXR; ideally we should update all of the ones that result in us pskill'ing the process.
Assignee | ||
Comment 1•7 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/51733/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/51733/
Attachment #8751004 -
Flags: review?(ryanvm)
Attachment #8751004 -
Flags: review?(ahalberstadt)
Comment 2•7 years ago
|
||
Comment on attachment 8751004 [details] MozReview Request: Bug 1270144 - fix message from timed out tests to indicate the following crash is caused by us terminating the browser process, r?ahal,RyanVM https://reviewboard.mozilla.org/r/51733/#review48739 It might also be nice to add a new log line to killAndGetStack along the lines of: "Forcing crash to get stack" Which might make it even more obvious that this is not a legit crash. ::: testing/mochitest/runtests.py:2444 (Diff revision 1) > def handleTimeout(self, timeout, proc, utilityPath, debuggerInfo, > browser_pid, processLog): > """handle process output timeout""" > # TODO: bug 913975 : _processOutput should call self.processOutputLine > # one more time one timeout (I think) > - error_message = "TEST-UNEXPECTED-TIMEOUT | %s | application timed out after %d seconds with no output" % ( > + error_message = "TEST-UNEXPECTED-TIMEOUT | %s | application timed out after %d seconds with no output - going to terminate it." % ( nit: might as well use a comma here for consistency
Attachment #8751004 -
Flags: review?(ahalberstadt) → review+
Comment 3•7 years ago
|
||
Comment on attachment 8751004 [details] MozReview Request: Bug 1270144 - fix message from timed out tests to indicate the following crash is caused by us terminating the browser process, r?ahal,RyanVM I'm really hoping this change in signature doesn't break all the "application timed out" oranges we have on file. Offhand, I don't *think* it will, but it does make me nervous. I'm a fan of ahal's idea of putting a second line in the log instead, along the lines of "INFO - Force-crashing the hung process to get a stack." or something to that effect.
Attachment #8751004 -
Flags: review?(ryanvm)
Assignee | ||
Comment 4•7 years ago
|
||
radar+ (In reply to Ryan VanderMeulen [:RyanVM] from comment #3) > Comment on attachment 8751004 [details] > MozReview Request: Bug 1270144 - fix message from timed out tests to > indicate the following crash is caused by us terminating the browser > process, r?ted,RyanVM > > I'm really hoping this change in signature doesn't break all the > "application timed out" oranges we have on file. Offhand, I don't *think* it > will, but it does make me nervous. > > I'm a fan of ahal's idea of putting a second line in the log instead, along > the lines of "INFO - Force-crashing the hung process to get a stack." or > something to that effect. Fair enough. Will update the patch.
Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED
Assignee | ||
Comment 5•7 years ago
|
||
Comment on attachment 8751004 [details] MozReview Request: Bug 1270144 - fix message from timed out tests to indicate the following crash is caused by us terminating the browser process, r?ahal,RyanVM Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51733/diff/1-2/
Attachment #8751004 -
Attachment description: MozReview Request: Bug 1270144 - fix message from timed out tests to indicate the following crash is caused by us terminating the browser process, r?ted,RyanVM → MozReview Request: Bug 1270144 - fix message from timed out tests to indicate the following crash is caused by us terminating the browser process, r?ahal,RyanVM
Attachment #8751004 -
Flags: review?(ryanvm)
Assignee | ||
Comment 6•7 years ago
|
||
Comment on attachment 8751004 [details] MozReview Request: Bug 1270144 - fix message from timed out tests to indicate the following crash is caused by us terminating the browser process, r?ahal,RyanVM I looked at putting this in killAndGetStack(), but it seems that gets called for child processes as well so it'd spam the log. :-\ I'm using warning() rather than info() because I want it to show up in the treeherder summary.
Attachment #8751004 -
Flags: review+ → review?(ahalberstadt)
Updated•7 years ago
|
Attachment #8751004 -
Flags: review?(ahalberstadt) → review+
Comment 7•7 years ago
|
||
Comment on attachment 8751004 [details] MozReview Request: Bug 1270144 - fix message from timed out tests to indicate the following crash is caused by us terminating the browser process, r?ahal,RyanVM https://reviewboard.mozilla.org/r/51733/#review49247
Comment 8•7 years ago
|
||
Comment on attachment 8751004 [details] MozReview Request: Bug 1270144 - fix message from timed out tests to indicate the following crash is caused by us terminating the browser process, r?ahal,RyanVM https://reviewboard.mozilla.org/r/51733/#review49251 Wording feels a bit awkward to me. How about "Force-terminating active process(es)." instead?
Attachment #8751004 -
Flags: review?(ryanvm) → review+
Comment 10•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/cc0133021748
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Assignee | ||
Comment 11•7 years ago
|
||
(In reply to :Gijs Kruitbosch from comment #6) > I'm using warning() rather than info() because I want it to show up in the > treeherder summary. This doesn't seem to have helped. What's guaranteed to show up in the treeherder logs?
Status: RESOLVED → REOPENED
Flags: needinfo?(ahalberstadt)
Resolution: FIXED → ---
Comment 12•7 years ago
|
||
You mean the blue summary? Yeah, pretty sure that needs to be at the ERROR level. Ed, is there another way to get a log in the treeherder summary? Or is using ERROR the best bet?
Flags: needinfo?(ahalberstadt) → needinfo?(emorley)
Comment 13•7 years ago
|
||
Could you paste links to some sample logs and/or lines from them? :-)
Flags: needinfo?(emorley)
Assignee | ||
Comment 14•7 years ago
|
||
(In reply to Ed Morley [:emorley] from comment #13) > Could you paste links to some sample logs and/or lines from them? :-) https://treeherder.mozilla.org/#/jobs?repo=try&revision=20211baffa40&selectedJob=21062223 raw log looks like: 12:23:15 WARNING - TEST-UNEXPECTED-TIMEOUT | browser/components/search/test/browser_contextSearchTabPosition.js | application timed out after 330 seconds with no output 12:23:15 INFO - Force-terminating active process(es). 12:23:15 INFO - Determining child pids from psutil 12:23:15 INFO - Found child pids: [] 12:23:15 INFO - Killing process: 492 12:23:15 INFO - Not taking screenshot here: see the one that was previously logged 12:23:15 INFO - TEST-INFO | Main app process: exit 1 12:23:15 WARNING - TEST-UNEXPECTED-FAIL | browser/components/search/test/browser_contextSearchTabPosition.js | application terminated with exit code 1 The code is here: https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#2443-2450 error_message = "TEST-UNEXPECTED-TIMEOUT | %s | application timed out after %d seconds with no output" % ( self.lastTestSeen, int(timeout)) self.message_logger.dump_buffered() self.message_logger.buffering = False self.log.info(error_message) self.log.warning("Force-terminating active process(es).");
Flags: needinfo?(emorley)
Comment 15•7 years ago
|
||
Oh, in that case this could be a bug in the tbplformatter not including the level in the log message. It could be doing this to avoid redundant log levels due to the mozharness info level. Assuming treeherder will highlight WARNING, then this is likely a bug in mozharness. The proper fix is to get mozharness to read from the structured log (bug 1261194). This way, mozharness can grab the 'level' from the JSON object and we don't need to print it twice. Ed, I'll leave the needinfo for this question: will log lines with WARNING in them get highlighted?
Comment 16•7 years ago
|
||
Treeherder will match on {ERROR,CRITICAL,FATAL}: https://github.com/mozilla/treeherder/blob/17eb4bc74cd4851b52bf12e087ef8d624dbb028c/treeherder/log_parser/parsers.py#L405 And additionally on any of the message strings that match: https://github.com/mozilla/treeherder/blob/17eb4bc74cd4851b52bf12e087ef8d624dbb028c/treeherder/log_parser/parsers.py#L356-L395 Ideally this should use `.error()` - long term I'd like to see the list of hardcoded error strings reduce in size, rather than grow, as more suites/steps are wrapped in mozharness-style error level prefixes.
Flags: needinfo?(emorley)
Comment 18•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/e2f558143b0b
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 19•7 years ago
|
||
(In reply to Ed Morley [:emorley] from comment #16) > Treeherder will match on {ERROR,CRITICAL,FATAL}: > https://github.com/mozilla/treeherder/blob/ > 17eb4bc74cd4851b52bf12e087ef8d624dbb028c/treeherder/log_parser/parsers. > py#L405 > > And additionally on any of the message strings that match: > https://github.com/mozilla/treeherder/blob/ > 17eb4bc74cd4851b52bf12e087ef8d624dbb028c/treeherder/log_parser/parsers. > py#L356-L395 > > Ideally this should use `.error()` - long term I'd like to see the list of > hardcoded error strings reduce in size, rather than grow, as more > suites/steps are wrapped in mozharness-style error level prefixes. This *still* doesn't work: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=29519107#L3965 It doesn't look like these log levels work the way you'd expect as they end up in this log as "INFO" first and "ERROR" later, and the treeherder matching expression only cares about the first part of the string. Andrew, do you know why?
Flags: needinfo?(ahalberstadt)
Updated•7 years ago
|
Summary: https://dxr.mozilla.org/mozilla-central/rev/0a25833062a880f369e6f9f622413a94cc671bf4/testing/mochitest/runtests.py#2444-2445 and friends should indicate they will kill the process → handleTimeout() and friends should indicate they will kill the process
Comment 20•7 years ago
|
||
Ugh, regex log parsing is horrible and this is a great example why. There's actually two places that the log is being parsed via regexes, mozharness and treeherder. First mozharness has its own log levels, and uses regexes to determine when something should get logged above INFO: https://dxr.mozilla.org/mozilla-central/rev/c9edfe35619f69f7785776ebd19a3140684024dc/testing/mozharness/mozharness/mozilla/testing/errors.py#92 Notice there's nothing about ERROR in there, so that's why your log is still showing up with the INFO mozharness level. Secondly, treeherder now parses the (already processed by mozharness) log. Going from what Ed linked: https://github.com/mozilla/treeherder/blob/17eb4bc74cd4851b52bf12e087ef8d624dbb028c/treeherder/log_parser/parsers.py#L405 It looks like ERROR is only highlighted if it appears at the start of the line (i.e if mozharness thinks it's an error). So we either need to: 1) Modify mozharness regexes to flag "93 ERROR" at the error level 2) Modify treeherder regexes to look at the whole string, not just the beginning 3) Use log.info("TEST-UNEXPECTED-ERROR | ...") Dealing with shit like this is why there are so many "TEST-UNEXPECTED" logs even when it's not related to a specific test. Alternatively you can leave it as is and wait for bug 1068138 and dependencies. If it's important to get fixed asap, just go with 3). The first two options might be hard to land.
Flags: needinfo?(ahalberstadt)
You need to log in
before you can comment on or make changes to this bug.
Description
•