Closed Bug 1433923 Opened 6 years ago Closed 6 years ago

Use detailed failure messages for wdspec test failures (pytest)

Categories

(Testing :: web-platform-tests, enhancement, P1)

Version 3
enhancement

Tracking

(firefox60 fixed, firefox61 fixed)

RESOLVED FIXED
mozilla61
Tracking Status
firefox60 --- fixed
firefox61 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

Details

Attachments

(1 file)

The current situation is that we have basically catch all failures bugs reported on Bugzilla of type:

> TEST-UNEXPECTED-FAIL | /webdriver/tests/actions/mouse.py | mouse.py::test_click_element_center 

By that line you don't see anything what went wrong, and whatever how many failures potentially could be present, all the reported failures on Treeherder would be starred against this single bug.

Can we extend the output by adding the exact failure like:

> TEST-UNEXPECTED-FAIL | /webdriver/tests/actions/mouse.py mouse.py::test_click_element_center | AssertionError: 342 == 344

This would help a lot in triaging and fixing known failures.
James, is there anything which would prevent us from making such a change in reporting? If not I would still have to figure out how to make that happen with pytest. Maybe Dave knows.
Flags: needinfo?(james)
I don't think this is really related to pytest. The pytest results are collected and reported in mozlog format at [1]. That's then serialized in tbpl format at [2]. The format there is "TEST-<STATUS> | <test name> | <subtest name> - <message>\n<stack>". So probably you want to ensure that the message property is populated with something.

[1] https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/pytestrunner/runner.py#86
[2] https://searchfox.org/mozilla-central/source/testing/mozbase/mozlog/mozlog/formatters/tbplformatter.py#25
Flags: needinfo?(james)
The message comes with the data, but is only used when it is present in the dictionary:

https://searchfox.org/mozilla-central/rev/b29daa46443b30612415c35be0a3c9c13b9dc5f6/testing/mozbase/mozlog/mozlog/formatters/tbplformatter.py#175

> message = "- " + data["message"] if "message" in data else ""

Given that we do not get a message at all it means that no message is present. Maybe we fail to set this property, or are trying to retrieve the wrong information from the pytest results.
Given that I do good amount of triage work for those tests too I will take this bug, which IMO is kinda important to fix.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Priority: -- → P1
Running wdspec tests locally clearly give me a null message for test failures inside the errorsummary.log file:

{"status": "FAIL", "group": "default", "line": 169, "subtest": "test_example", "action": "test_result", "test": "/webdriver/tests/cookies/add_cookie.py", "message": null, "stack": "session = <webdriver.client.Session object at 0x10e0811d0>, url = <function url at 0x10ba16398>\nserver_config = {'aliases': [], 'bind_hostname': False, 'certificate': '/Volumes/data/code/gecko/testing/web-platform/certs/web-platform.test.pem', 'check_subdomains': False, ...}\n\n    def test_example(session, url, server_config):\n>       assert \"asdf\" in \"aaaaaaaa\"\nE       AssertionError: assert 'asdf' in 'aaaaaaaa'\n\nserver_config = {'aliases': [], 'bind_hostname': False, 'certificate': '/Volumes/data/code/gecko/testing/web-platform/certs/web-platform.test.pem', 'check_subdomains': False, ...}\nsession    = <webdriver.client.Session object at 0x10e0811d0>\nurl        = <function url at 0x10ba16398>\n\ntesting/web-platform/tests/webdriver/tests/cookies/add_cookie.py:181: AssertionError", "expected": "PASS"}

Going up the stack and checking `TestRunnerManager.test_ended()` the passed in test results do not contain a message, which means that we miss to populate the `message` field in the call to `WdspecRun()`. In `do_wdspec()` we call `pytestrunner.run()`, which passes in a `SubtestResultRecorder` into the `pytest.main()` call. So I'm fairly sure this result recorder class doesn't work as expected for pytest based executors.
So it's indeed a bug in our test result collection from pytest. We do not explicitly set a message for `record()` and as such it is kept as None:

https://dxr.mozilla.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/executors/pytestrunner/runner.py#104

By using the `pytest_runtest_logreport` hook, I do not see any chance for us to have a simple failure message available. Only `report.longrepr` contains it in the stack. So I could extract the message like:

>     def record_fail(self, report):
>         match = re.search("^E\s(.*)", report.longreprtext, re.MULTILINE)
>         if match:
>             message = match.group(1).strip()
>         self.record(report.nodeid, "FAIL", message=message, stack=report.longrepr)

It will result in a message like `"message": "AssertionError: assert 'asdf' in 'aaaaaaaa'"` for the above shown stack.
Btw the pytest-html plugin also uses this method to retrieve the assertion details:
https://github.com/pytest-dev/pytest-html/blob/c61dc5bbe08fc7dfb20d27df8adc7a61eb04a902/pytest_html/plugin.py#L230-L239
I pushed a try build:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=def3228e264856c912e6be58366af012f8f919b2&selectedJob=169642861

The failing test is getting reported as:

> TEST-UNEXPECTED-FAIL | /webdriver/tests/cookies/add_cookie.py | test_example - AssertionError: assert 'asdf' in 'aaaaaaaa'

I think that doesn't apply to our requirements for Treeherder log parsing. After the second `|` only the test failure has to be shown. So the subtest name would have to be moved.
Attachment #8961345 - Flags: review?(james)
(In reply to James Graham [:jgraham] from comment #2)
> That's then serialized in tbpl format at [2]. The format there is:
> "TEST-<STATUS> | <test name> | <subtest name> - <message>\n<stack>".

Is that format really what we want? It differs from all the other test jobs as mentioned in my last comment already. If we want to get this fixed I would suggest to file a new bug given that it is unrelated to the message part.
Flags: needinfo?(james)
Comment on attachment 8961345 [details]
Bug 1433923 - Use detailed failure message for wdspec test failures (pytest).

https://reviewboard.mozilla.org/r/230134/#review235780

::: testing/web-platform/tests/tools/wptrunner/wptrunner/executors/pytestrunner/runner.py:106
(Diff revision 1)
>  
>      def record_pass(self, report):
>          self.record(report.nodeid, "PASS")
>  
>      def record_fail(self, report):
> -        self.record(report.nodeid, "FAIL", stack=report.longrepr)
> +        match = re.search("^E\s(.*)", report.longreprtext, re.MULTILINE)

So… you are looking for a line that starts with "E"? It's very unclear what's actually going on here.
Attachment #8961345 - Flags: review?(james) → review-
>> That's then serialized in tbpl format at [2]. The format there is:
>> "TEST-<STATUS> | <test name> | <subtest name> - <message>\n<stack>".
> Is that format really what we want? It differs from all the other test jobs as mentioned in my last comment already. If we want to get this fixed I would suggest to file a new bug given that it is unrelated to the message part.

That was a statement of fact. The format is carefully designed  to work with the treeherder parser, and I'm not proposing changing anything.
Flags: needinfo?(james)
Comment on attachment 8961345 [details]
Bug 1433923 - Use detailed failure message for wdspec test failures (pytest).

https://reviewboard.mozilla.org/r/230134/#review235780

> So… you are looking for a line that starts with "E"? It's very unclear what's actually going on here.

Yes, the error details are listed as following in the stack:

>     def test_example(session, url, server_config):
> >       assert "asdf" in "aaaaaaaa"
> E       AssertionError: assert 'asdf' in 'aaaaaaaa'

In pytest-html Dave looks for a line starting with `E   `. So would you be fine when it requires at least three blanks afterward?
Comment on attachment 8961345 [details]
Bug 1433923 - Use detailed failure message for wdspec test failures (pytest).

https://reviewboard.mozilla.org/r/230134/#review235780

> Yes, the error details are listed as following in the stack:
> 
> >     def test_example(session, url, server_config):
> > >       assert "asdf" in "aaaaaaaa"
> > E       AssertionError: assert 'asdf' in 'aaaaaaaa'
> 
> In pytest-html Dave looks for a line starting with `E   `. So would you be fine when it requires at least three blanks afterward?

The above is not formattet correctly. Maybe this time it is better:

```
    def test_example(session, url, server_config):
>       assert "asdf" in "aaaaaaaa"
E       AssertionError: assert 'asdf' in 'aaaaaaaa'
```
Comment on attachment 8961345 [details]
Bug 1433923 - Use detailed failure message for wdspec test failures (pytest).

https://reviewboard.mozilla.org/r/230134/#review235780

> The above is not formattet correctly. Maybe this time it is better:
> 
> ```
>     def test_example(session, url, server_config):
> >       assert "asdf" in "aaaaaaaa"
> E       AssertionError: assert 'asdf' in 'aaaaaaaa'
> ```

So the most important thing is to document what we're trying to match since this is basically an ad-hoc parser. So the minimum requirement here is a comment.

It is also possible that the code will look simpler like
```
# pytest outputs the stacktrace followed by an error message prefixed by the letter E
# e.g.
# <example>
msg = ""
for line in report.longreprtext.splitlines():
   if line.startswith("E"):
       msg = line[1:].strip()
       break
```

But it depends what the exact semantics  are. Can you have > 1 E-prefixed line? Should you take everything after the first E prefixed line?
Comment on attachment 8961345 [details]
Bug 1433923 - Use detailed failure message for wdspec test failures (pytest).

https://reviewboard.mozilla.org/r/230134/#review235780

> So the most important thing is to document what we're trying to match since this is basically an ad-hoc parser. So the minimum requirement here is a comment.
> 
> It is also possible that the code will look simpler like
> ```
> # pytest outputs the stacktrace followed by an error message prefixed by the letter E
> # e.g.
> # <example>
> msg = ""
> for line in report.longreprtext.splitlines():
>    if line.startswith("E"):
>        msg = line[1:].strip()
>        break
> ```
> 
> But it depends what the exact semantics  are. Can you have > 1 E-prefixed line? Should you take everything after the first E prefixed line?

Dave pointed me to https://github.com/pytest-dev/pytest/blob/ed118d7f2051ccfdb5c56cb6535e6f89485b1225/_pytest/_code/code.py#L531-L541, which shows that by default an error line is prefixed with "E   ".

I think for our use it's best to just limit to the first line as summary for the failure log. Also I have never seen a multi-line error yet.

So I think your example code is better because it returns immediately after the first error line. The regex would return everything.
Also I filed an issue against pytest to maybe get a property which would allow directly fetching the assertion message:
https://github.com/pytest-dev/pytest/issues/3343
Comment on attachment 8961345 [details]
Bug 1433923 - Use detailed failure message for wdspec test failures (pytest).

https://reviewboard.mozilla.org/r/230134/#review236780

::: testing/web-platform/tests/tools/wptrunner/wptrunner/executors/pytestrunner/runner.py:15
(Diff revision 2)
>  """
>  
>  import errno
>  import json
>  import os
> +import re

Unused.

::: testing/web-platform/tests/tools/wptrunner/wptrunner/executors/pytestrunner/runner.py
(Diff revision 2)
> +
> +        self.record(report.nodeid, "FAIL", message=message, stack=report.longrepr)
>  
>      def record_error(self, report):
>          # error in setup/teardown
> -        if report.when != "call":

Why this change?
Attachment #8961345 - Flags: review?(james) → review+
Comment on attachment 8961345 [details]
Bug 1433923 - Use detailed failure message for wdspec test failures (pytest).

https://reviewboard.mozilla.org/r/230134/#review236780

> Why this change?

We already have this check above. But I will revert it so that we only touch the necessary code.
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/849020c741de
Use detailed failure message for wdspec test failures (pytest). r=jgraham
Sheriffs please note that with this change the bug suggestions will change for wdspec tests. Especially for all of those which are not timeout failures. Existing bugs will need an update of the summary, or if filed as a new bug, I will dupe them later today or tomorrow. Thanks.
https://hg.mozilla.org/mozilla-central/rev/849020c741de
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
I would like to see this patch uplifted to mozilla-beta for a better failure discovery of wd test jobs.
Whiteboard: [checkin-needed-beta]
Whiteboard: [checkin-needed-beta]
Created web-platform-tests PR https://github.com/w3c/web-platform-tests/pull/10215 for changes under testing/web-platform/tests
Hm, why didn't the linter pick up this indentation error:

+        message = ""
+        for line in report.longreprtext.splitlines():
+           if line.startswith("E   "):
+               message = line[1:].strip()

That is the reason why the upstream merge failed. Maja and James, I assume I don't have to fix it in central given that the next merge will fix it automatically, right?
Flags: needinfo?(mjzffr)
Flags: needinfo?(james)
Flags: needinfo?(ahalberstadt)
Because testing/web-platform isn't enabled for the flake8 linter:
https://searchfox.org/mozilla-central/source/tools/lint/flake8.yml#4
Flags: needinfo?(ahalberstadt)
Right.
Flags: needinfo?(mjzffr)
Flags: needinfo?(james)
(In reply to Andrew Halberstadt [:ahal] from comment #32)
> Because testing/web-platform isn't enabled for the flake8 linter:
> https://searchfox.org/mozilla-central/source/tools/lint/flake8.yml#4

So as informed by Andreas we have a different set of linter settings available for web-platform-tests, and those should be run via `mach lint -l wpt -l wpt_manifest` (https://developer.mozilla.org/en-US/docs/Mozilla/QA/web-platform-tests#Lint). But whatever I do I cannot get the linter to fail for the above broken indendation.

Two questions:

1) How do I have to run the linter to really lint testing/web-platform/tests/tools/wptrunner/wptrunner/executors/pytestrunner/runner.py for Python code

2) Is there a way to get this linter integrated with any linting job on Taskcluster?
Flags: needinfo?(james)
Flags: needinfo?(ahalberstadt)
We should enable pyflakes for the same directories and with the same settings as upstream travis does i.e. https://github.com/w3c/web-platform-tests/blob/master/tools/tox.ini#L25
Flags: needinfo?(james)
Sounds like the lint integration doesn't automatically set up pyflakes the same way it does upstream. I consider this a bug in the linter implementation here:
https://searchfox.org/mozilla-central/source/tools/lint/wpt/wpt.py

Standard8 and I are trying to push new restrictions on our linters such that they just work out of the box (without requiring any additional setup on the part of the user). So please feel free to file under Firefox Build System :: Lint and Formatting if you like.

As far as taskcluster, there already is a job:
https://searchfox.org/mozilla-central/source/taskcluster/ci/source-test/mozlint.yml#142

It'll only run if you modified one of those files. It might also be hitting the same pyflakes issue that you are though.
Flags: needinfo?(ahalberstadt)
Maja, is it expected that we get the "Upstream PR merged" comment twice from the wpt-sync bot?
Flags: needinfo?(mjzffr)
This may me due to some semi-manual updates we've done here and there to catch up with bugs that have been missed by the upstreaming process. Many fixes to upstreaming have been deployed today, so expect a change in activity.
Flags: needinfo?(mjzffr)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: