Closed Bug 1017559 Opened 10 years ago Closed 10 years ago

Marionette timeouts should output failures in a way that does not cause mozharness and TBPL false positives

Categories

(Remote Protocol :: Marionette, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: jgriffin)

References

(Blocks 1 open bug)

Details

(Keywords: sheriffing-P1)

Attachments

(3 files)

From bug 1016592.

Example:
https://tbpl.mozilla.org/php/getParsedLog.php?id=40629848&tree=B2g-Inbound
b2g_ubuntu64_vm b2g-inbound opt test gaia-ui-test on 2014-05-29 04:35:51 PDT for push 9e063b5adb2a
{
04:53:43     INFO -  TEST-START test_dialer_add_contact.py
04:54:30     INFO -  test_dialer_add_contact (test_dialer_add_contact.TestDialerAddContact) ... ERROR
04:54:30     INFO -  ======================================================================
04:54:30     INFO -  ERROR: None
04:54:30     INFO -  ----------------------------------------------------------------------
04:54:30     INFO -  Traceback (most recent call last):
04:54:30     INFO -    File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/marionette_test.py", line 170, in run
04:54:30     INFO -      testMethod()
04:54:30     INFO -    File "/builds/slave/test/gaia/tests/python/gaia-ui-tests/gaiatest/tests/functional/dialer/test_dialer_add_contact.py", line 47, in test_dialer_add_contact
04:54:30     INFO -      contacts = self.phone.tap_contacts()
04:54:30     INFO -    File "/builds/slave/test/gaia/tests/python/gaia-ui-tests/gaiatest/apps/phone/app.py", line 41, in tap_contacts
04:54:30     INFO -      return self.switch_to_contacts()
04:54:30     INFO -    File "/builds/slave/test/gaia/tests/python/gaia-ui-tests/gaiatest/apps/phone/app.py", line 31, in switch_to_contacts
04:54:30     INFO -      self.wait_for_element_present(*self._contacts_frame_locator)
04:54:30     INFO -    File "/builds/slave/test/gaia/tests/python/gaia-ui-tests/gaiatest/apps/base.py", line 29, in wait_for_element_present
04:54:30     INFO -      lambda m: m.find_element(by, locator))
04:54:30     INFO -    File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/wait.py", line 143, in until
04:54:30     INFO -      cause=last_exc)
04:54:30    ERROR -  TimeoutException: Traceback (most recent call last):
04:54:30     INFO -    File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/wait.py", line 122, in until
04:54:30     INFO -      rv = condition(self.marionette)
04:54:30     INFO -    File "/builds/slave/test/gaia/tests/python/gaia-ui-tests/gaiatest/apps/base.py", line 29, in <lambda>
04:54:30     INFO -      lambda m: m.find_element(by, locator))
04:54:30     INFO -    File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/marionette.py", line 1213, in find_element
04:54:30     INFO -      response = self._send_message('findElement', 'value', **kwargs)
04:54:30     INFO -    File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/decorators.py", line 35, in _
04:54:30     INFO -      return func(*args, **kwargs)
04:54:30     INFO -    File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/marionette.py", line 633, in _send_message
04:54:30     INFO -      self._handle_error(response)
04:54:30     INFO -    File "/builds/slave/test/build/venv/local/lib/python2.7/site-packages/marionette/marionette.py", line 664, in _handle_error
04:54:30    ERROR -      raise NoSuchElementException(message=message, status=status, stacktrace=stacktrace)
04:54:30    ERROR -  TEST-UNEXPECTED-FAIL | test_dialer_add_contact.py test_dialer_add_contact.TestDialerAddContact.test_dialer_add_contact | TimeoutException: Traceback (most recent call last):
04:54:30     INFO -  ----------------------------------------------------------------------
}

This line of output is problematic:
04:54:30    ERROR -  TimeoutException: Traceback (most recent call last):

...since it matches against the regexp used by both mozharness and TBPL to find Python exceptions - and isn't specific to the test that failed, so will cause false positives for TBPL's bug suggestions.

I'm presuming the nested exceptions weren't intentional?
Flags: needinfo?(dburns)
On a related note, for marionette-webapi we sometimes get:
https://tbpl.mozilla.org/php/getParsedLog.php?id=40631191&tree=B2g-Inbound
b2g_emulator_vm b2g-inbound opt test marionette-webapi on 2014-05-29 04:39:21 PDT for push 9e063b5adb2a
{
04:49:34     INFO -  TEST-START test_log.py
04:55:40     INFO -  test_log_basic (test_log.TestLog) ... ERROR
04:55:42     INFO -  test_log_script (test_log.TestLog) ... ok
04:55:44     INFO -  test_that_we_can_clear_the_logs (test_log.TestLog) ... ok
04:55:45     INFO -  test_log_basic (test_log.TestLogChrome) ... ok
04:55:47     INFO -  test_log_script (test_log.TestLogChrome) ... ok
04:55:49     INFO -  test_that_we_can_clear_the_logs (test_log.TestLogChrome) ... ok
04:55:49     INFO -  ======================================================================
04:55:49     INFO -  ERROR: None
04:55:49     INFO -  ----------------------------------------------------------------------
04:55:49     INFO -  Traceback (most recent call last):
04:55:49     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/marionette_test.py", line 198, in run
04:55:49     INFO -      self.tearDown()
04:55:49     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/marionette_test.py", line 348, in tearDown
04:55:49     INFO -      (self.filepath.replace('\\', '\\\\'), self.methodName))
04:55:49     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/marionette.py", line 1138, in execute_script
04:55:49     INFO -      filename=os.path.basename(frame[0]))
04:55:49     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/decorators.py", line 35, in _
04:55:49     INFO -      return func(*args, **kwargs)
04:55:49     INFO -    File "/builds/slave/test/build/tests/marionette/marionette/marionette.py", line 616, in _send_message
04:55:49     INFO -      "Connection timed out", status=ErrorCodes.TIMEOUT)
04:55:49    ERROR -  TimeoutException: TimeoutException: Connection timed out
04:55:49     INFO -  Most recent errors/exceptions are:
04:55:49     INFO -  05-29 07:49:24.013 E/GeckoConsole(   44): [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "http://10.132.58.45:45274/test_clearing.html" line: 0}]
04:55:49     INFO -  05-29 07:49:25.505 I/Gecko   (   44): 1401364165507	Marionette	INFO	sendToClient: {"from":"0","error":{"message":"Element must be user-editable in order to clear it.","status":12,"stacktrace":"y@chrome://marionette/content/atoms.js:21:701\nid@chrome://marionette/content/atoms.js:87:1228\nclearElement/<@chrome://marionette/content/atoms.js:88:92\nclearElement@chrome://marionette/content/atoms.js:88:147\nclearElement@chrome://marionette/content/marionette-listener.js:1783:5\n"}}, {e9661a0a-07af-42c1-bb5f-393fc0f1ec28}, {e9661a0a-07af-42c1-bb5f-393fc0f1ec28}
04:55:49     INFO -  05-29 07:49:27.733 E/GeckoConsole(   44): [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "http://10.132.58.45:45274/test_clearing.html" line: 0}]
04:55:49     INFO -  05-29 07:49:31.143 E/GeckoConsole(   44): [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "http://10.132.58.45:45274/test_clearing.html" line: 0}]
04:55:49     INFO -  05-29 07:49:36.763 I/Gecko   (   44): 1401364176765	Marionette	INFO	sendToClient: {"from":"0","value":[["ERROR","I AM ERROR","Thu May 29 2014 07:49:36 GMT-0400 (EDT)"]]}, {523d802a-15a6-443c-b8bd-64f8aae7c6f3}, {523d802a-15a6-443c-b8bd-64f8aae7c6f3}
04:55:49    ERROR -  TEST-UNEXPECTED-FAIL | test_log.py test_log.TestLog.test_log_basic | TimeoutException: TimeoutException: Connection timed out
}

Actual:
04:55:49    ERROR -  TimeoutException: TimeoutException: Connection timed out
...
04:55:49    ERROR -  TEST-UNEXPECTED-FAIL | test_log.py test_log.TestLog.test_log_basic | TimeoutException: TimeoutException: Connection timed out

Expected:
04:55:49    INFO  -  some-prefix-that-doesn't-hit-regexp: Connection timed out
...
04:55:49    ERROR -  TEST-UNEXPECTED-FAIL | test_log.py test_log.TestLog.test_log_basic | TimeoutException: TimeoutException: Connection timed out
Keywords: sheriffing-P1
Summary: Gaia-ui-test timeouts should output failures in a way that does not cause mozharness and TBPL false positives → Marionette timeouts should output failures in a way that does not cause mozharness and TBPL false positives
Attached image screenshot.jpg
Screenshot to give an idea of the impact this has.
Bug 967318 was filed on 2014-02-04 - the behaviour of the harness has been regressed since then in various ways (see bug 837827 for an example of what it used to be) - and whilst I'm extremely grateful for the work done on this so far, we're now almost 4 months down the line and the various marionette-using suites are still not conforming to https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy - which is extremely frustrating/wearying from a sheriffing point of view.
Depends on: 1006511, 967318
I think the right answer here is probably changing the way we output exceptions so that we print a single line containing all the relevant data in TBPL-compatible format, and then we update the mozharness parser to ignore Marionette-specific stack traces (e.g., TimeoutException).
Looking at this more closely, I think we can handle this all on the mozharness side; I'm happy to take a crack at it.
Assignee: nobody → jgriffin
fwiw, Marionette is doing the right thing in this situation since it should be emulating webdriver. The nested exceptions are expected but due to the way the logs are being generated we are seeing them. 

I agree with Jonathan that this is a mozharness issue
Flags: needinfo?(dburns)
This changes the way that mozharness matches error for Marionette tests.  Previously, we were not matching TEST-UNEXPECTED-FAIL, but were matching any of the Marionette exception types, which caused us to match twice for every error: once for the stack trace, and once for the TEST-UNEXPECTED-FAIL line.

This patch reverses this logic and lets us match TEST-UNEXPECTED-FAIL, not match any standard Marionette exception, but do match any random Python exception.
Attachment #8431225 - Flags: review?(emorley)
(In reply to David Burns :automatedtester from comment #6)
> fwiw, Marionette is doing the right thing in this situation since it should
> be emulating webdriver. The nested exceptions are expected but due to the
> way the logs are being generated we are seeing them. 

I understand that the aim is to emulate webdriver - but when it comes to log output, if it causes issues for all consumers of our logs, due to inconsistency with all of our existing test suites, then I think it's "no longer the right thing" for when it runs in our automation.

> I agree with Jonathan that this is a mozharness issue

The only downside to fixing this in mozharness, is that other consumers (eg TBPL, treeherder) have to search for Python exceptions too (not everything uses the mozharness wrapper + if mozharness raises, we need to be ready to catch that too). As such, unless we hardcode exceptions for marionette at every juncture, we're still going to hit this.
Comment on attachment 8431225 [details] [diff] [review]
Change the way we match Marionette errors in mozharness,

Review of attachment 8431225 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you for doing this - I think it's an improvement, but won't get us all the way there to no false positives for the reasons in comment 8. However probably worth taking this for now either way :-)

Punting to aki, since he has been more involved with the base errors.py
Attachment #8431225 - Flags: review?(emorley) → review?(aki)
Status: NEW → ASSIGNED
Attachment #8431225 - Flags: review?(aki) → review+
(In reply to Ed Morley [:edmorley UTC+0] from comment #8)
> (In reply to David Burns :automatedtester from comment #6)
> 
> > I agree with Jonathan that this is a mozharness issue
> 
> The only downside to fixing this in mozharness, is that other consumers (eg
> TBPL, treeherder) have to search for Python exceptions too (not everything
> uses the mozharness wrapper + if mozharness raises, we need to be ready to
> catch that too). As such, unless we hardcode exceptions for marionette at
> every juncture, we're still going to hit this.

The real issue here is that Marionette is unique among our test tools in that a test failure (as opposed to a harness error, etc) is always triggered by a Python exception, and the stack trace is very useful in understanding the failure.

In our other harnesses, test failures and Python exceptions are independent of each other.

I think we will have to make Treeherder smart enough to deal with this in one way or another; I'll be interested to see if this is still problematic there after this mozharness patch goes into production.
Just merged this to production.
Since this merged, Marionette failures in TBPL show suggested lines for things like this:
test_cause_string (test_errors.TestMarionetteException) ... ok
test_cause_tuple (test_errors.TestMarionetteException) ... ok
test_construction (test_errors.TestMarionetteException) ... ok
test_defaults (test_errors.TestMarionetteException) ... ok
test_str (test_errors.TestMarionetteException) ... ok
The mozharness piece of this has landed.  Like you predicted Ed, it doesn't really solve the problem due to TBPL's parsing.  Treeherder doesn't seem to be bothered by this, though.

To fix TBPL, we'd have to apply similar regex exclusions here:

https://hg.mozilla.org/webtools/tbpl/file/0d9836ff8bf0/php/inc/GeneralErrorFilter.php#l51

Ed, would you like me to do this?
(In reply to Wes Kocher (:KWierso) from comment #13)
> Since this merged, Marionette failures in TBPL show suggested lines for
> things like this:
> test_cause_string (test_errors.TestMarionetteException) ... ok
> test_cause_tuple (test_errors.TestMarionetteException) ... ok
> test_construction (test_errors.TestMarionetteException) ... ok
> test_defaults (test_errors.TestMarionetteException) ... ok
> test_str (test_errors.TestMarionetteException) ... ok

Whoops, I missed an exclusion, I'll fix ASAP.
(In reply to Jonathan Griffin (:jgriffin) from comment #15)
> (In reply to Wes Kocher (:KWierso) from comment #13)
> > Since this merged, Marionette failures in TBPL show suggested lines for
> > things like this:
> > test_cause_string (test_errors.TestMarionetteException) ... ok
> > test_cause_tuple (test_errors.TestMarionetteException) ... ok
> > test_construction (test_errors.TestMarionetteException) ... ok
> > test_defaults (test_errors.TestMarionetteException) ... ok
> > test_str (test_errors.TestMarionetteException) ... ok
> 
> Whoops, I missed an exclusion, I'll fix ASAP.

https://hg.mozilla.org/build/mozharness/rev/e9bc9209ac0a and pushed to production.
This applies the same logic to TBPL's parsing.
Attachment #8432542 - Flags: review?(emorley)
Comment on attachment 8432542 [details] [diff] [review]
Exclude Marionette exceptions from TBPL error flagging,

Review of attachment 8432542 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the change below :-)

::: php/inc/GeneralErrorFilter.php
@@ +32,5 @@
>      $line = preg_replace($mozharnessPrefixRE, '', $fullLine);
> +    // Flag Python exceptions _except_ for Marionette exceptions, all of which should
> +    // get handled with TEST-UNEXPECTED-FAIL.
> +    if (preg_match("/^(?!Marionette|TestMarionette|NoSuchElement|XPathLookup|NoSuchWindow|StaleElement|ScriptTimeout|ElementNotVisible|NoSuchFrame|InvalidResponse|Javascript|Timeout|InvalidElementState|NoAlertPresent|InvalidCookieDomain|UnableToSetCookie|InvalidSelector|MoveTargetOutOfBounds)\w*Exception: /", $line))
> +      return true;

This changes the pattern (excluding the negative lookahead) from:
[A-Za-z.]*Exception:
to:
\w*Exception:

\w is equivalent to [A-Za-z0-9_] which means we won't match against the period which is present in some of our exceptions (eg bug 989048).

Perhaps:
/^(?!...)[.\w]*Exception: /"

Thank you for doing this :-)
Attachment #8432542 - Flags: review?(emorley) → review+
Took me a while to figure out the right regular expression for this. :P  Turns out it is

/^[.\w]*\b(?!...)\w*Exception: /

https://hg.mozilla.org/webtools/tbpl/rev/8c4e63dc51a2
Looks like this is working well on tbpl-dev.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Blocks: 1019678
(In reply to Jonathan Griffin (:jgriffin) from comment #19)
> Took me a while to figure out the right regular expression for this. :P 
> Turns out it is
> 
> /^[.\w]*\b(?!...)\w*Exception: /

I think this actually changes behaviour slightly - and is also slightly more complicated than we need. I think maybe a whitelist approach (ie return false if we match the Marionette exceptions only, rather than adjusting the main exception regex) might be clearer. I'll file another bug and have a go there.
Blocks: 1020296
Depends on: 1028947
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: