Open Bug 1635543 Opened 5 years ago Updated 4 years ago

when tests fail and there isn't a test path in the failure message, output the directory/manifest that is run

Categories

(Testing :: General, task, P3)

Version 3
task

Tracking

(Not tracked)

People

(Reporter: jmaher, Unassigned)

References

(Blocks 1 open bug)

Details

often we have failures that are leaks, crashes, assertions, timeouts, infrastructure. Many times this a result of an issue where the browser failed something during the set of tests and we detect a failure on shutdown.

In order for ML test scheduling to properly annotate the failure, we should ensure these failures have a manifest or some related identifier in the error output.

taking this as an example:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&selectedTaskRun=AAl6hTQhSIqbQCYKQNZEKw-0&revision=385f49adaf00d02fc8e04da1e0031e3477182d67&searchStr=mda2

this is what treeherder shows as a failure:

TEST-UNEXPECTED-FAIL | Last test finished | application terminated with exit code 11
Exceeded max 20 bug suggestions, most of which are likely false positives.
PROCESS-CRASH | Last test finished | application crashed [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]
Exceeded max 20 bug suggestions, most of which are likely false positives.
Return code: 1
Got 1 unexpected crashes
# TBPL FAILURE #
The mochitest suite: mochitest-media ran with return status: FAILURE

and what I see in the errorsummary.log file:
{"action": "log", "message": "TEST-UNEXPECTED-FAIL | Last test finished | application terminated with exit code 11", "line": 2186, "level": "ERROR"}

I would start with a proposal that we should make:
ERROR - TEST-UNEXPECTED-FAIL | Last test finished | application terminated with exit code 11

look like:
ERROR - TEST-UNEXPECTED-FAIL | dom/media/test/ | Last test finished: application terminated with exit code 11

there are 2 things different here:

  1. directory name is in the second part of the line
  2. the original part Last test finished is now in the error message with a : separating it from the original error message.

For this specific failure we would need to edit:
https://searchfox.org/mozilla-central/source/testing/mochitest/runtests.py#2359

The caveat to this is we have a crash:
application crashed [@ mozilla::(anonymous namespace)::RunWatchdog(void*)]

that could get overlooked across other suites and manifests because we are pinning it to a specific directory (in this case dom/media/test/)

Severity: -- → N/A
Priority: -- → P3

How frequent are these kinds of failures (if you exclude intermittents)? Should it be a higher priority for us?

here is another case:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305636826&repo=autoland&lineNumber=2177

and a bit from the structured log:
..."source": "mochitest", "reason": "MOZ_RELEASE_ASSERT(domWindow)", "test": "remoteautomation.py", "minidump_path": "/tmp/tmpzcNdn3/1a536fd3-594a-9c78-d41c-bf405e0d4c1f.dmp", "time": 1591712678616, "action": "crash"}

ekyle would like to see something in the structured log to indicate:
"group":"dom/media/test/mochitest.ini" / "test":None

:ekyle want to add more, or does this cover our conversation?

Flags: needinfo?(klahnakoski)
See Also: → 1548715

My only request is to not fill the "test" field if the failure can not be blamed on a test. Adding a "group" field to the line is optional.

Adding "action":"group_start" and "action":"group_end" records would be a good way to mark tests as belonging to a group, AND provide a way to describe the failure of a group, but not a test.

Flags: needinfo?(klahnakoski)
You need to log in before you can comment on or make changes to this bug.