Closed Bug 901832 Opened 11 years ago Closed 6 years ago

Provide a command line option to output a manifest of test failures

Categories

(Remote Protocol :: Marionette, enhancement, P3)

All
Other
enhancement

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: angelc04, Unassigned)

References

Details

(Keywords: pi-marionette-runner)

It will be very helpful if we can rerun failed tests automatically and keep all test results. 
Then QA can analyze the test results and quickly identify whether the failure is caused by some random reason or a product issue.
I can see two options here. The first would be to simply output a manifest file detailing the failed tests. This can then be used to rerun the failures, however the results/reports would be separate.

The second option would be to automatically rerun the failures and either include details of all runs or only the last run of each test in the results/reports. I think this is my personal preference, but probably more difficult to implement.
I think there are some details should be defined:
1. When we should automatically rerun the failures?
   If we rerun the failures immediately after it fails, it is very possible that it fails again due to the same reason(poor data connection and so on).
   So I think it would be better if we can rerun after some time. Like after all tests finished one run. Then we start run the failed tests.
   In this case, Option 1 may be a good choice.

2. What results/reports we keep?
   I would prefer to keep results for all runs for comparison reason. If we only have the latest result, it would be the same as the first run.
   To keep all results, I think we can simply add the timestamp or test instance id to the end of the test suite name (like TS_Clock_XXXXXXXXXXXXXXXXX).


So the logic could be:
1. Suppose we have a list test suites (each test suite has a set of tests). And we put them in a waiting list.
2. We take a test suite from the list and run it. After finish running, we check whether there is a failure. If it is a failure, we add it to the end of waiting list, and continue with other test suites. If it is a pass, we just continue.

[concern] 
1. if we add the whole test suite, the time of running all test will be long. 
2. What if a test suite already run twice and it is still a failure. I think there should be a limit on how many times.


How do you think about this?
(In reply to pcheng from comment #2)
> I think there are some details should be defined:
> 1. When we should automatically rerun the failures?
>    If we rerun the failures immediately after it fails, it is very possible
> that it fails again due to the same reason(poor data connection and so on).
>    So I think it would be better if we can rerun after some time. Like after
> all tests finished one run. Then we start run the failed tests.
>    In this case, Option 1 may be a good choice.

I think both are valid use cases, and could even be implemented separately. An option to create a manifest of failed tests and an option to automatically rerun failed tests immediately.

> 2. What results/reports we keep?
>    I would prefer to keep results for all runs for comparison reason. If we
> only have the latest result, it would be the same as the first run.
>    To keep all results, I think we can simply add the timestamp or test
> instance id to the end of the test suite name (like
> TS_Clock_XXXXXXXXXXXXXXXXX).

From a QA perspective it's useful to have the history for sure, however I would simply use an index for this rather than a timestamp. For others consuming the reports it's likely that they're only going to be interested in the final result. If we use the manifest approach however we would have multiple distinct results, which would be difficult to consume.

> So the logic could be:
> 1. Suppose we have a list test suites (each test suite has a set of tests).
> And we put them in a waiting list.
> 2. We take a test suite from the list and run it. After finish running, we
> check whether there is a failure. If it is a failure, we add it to the end
> of waiting list, and continue with other test suites. If it is a pass, we
> just continue.

We don't currently have this concept of a waiting list. Tests are discovered at the start of the testrun, and then executed. Storing the failed tests in a new manifest file should be easy enough for the user to run them again, as should immediately rerunning the failed test. This won't necessarily help tests that fail due to prolonged network issues, but we also wouldn't want to mask such issues either.

> [concern] 
> 1. if we add the whole test suite, the time of running all test will be
> long. 

That's a valid concern. If all tests failed then they would all be rerun, and this would exponentially increase the length of the test run. The worst case scenario might be that all tests fail due to a socket/script/search timeout.

> 2. What if a test suite already run twice and it is still a failure. I think
> there should be a limit on how many times.

I agree. If we automatically rerun failures then we should either just rerun once, or provide a command line argument to specify how many times to rerun the failures.

Ahal: I think I overheard you talking about rerunning failures for another harness this week (perhaps with Chris?). If so what approach have you used or are planning to use?
Flags: needinfo?(ahalberstadt)
Earlier in the week I proposed a way to add a rerun-failures option to xpcshell in bug 729098. A similar option was added to mochitests as a make target in bug 725112, and an option to run until failure in bug 875463. I think those are the relevant precedents in other harnesses. As far as I can tell, each of these is intended for use by a developer attempting to diagnose a problem locally, and not as a part of running CI (the tbpl UI has a button to re-trigger a particular test job, this may be used in some cases to determine if a test failure is intermittent).
(In reply to Chris Manchester [:chmanchester] from comment #4)
> Earlier in the week I proposed a way to add a rerun-failures option to
> xpcshell in bug 729098. A similar option was added to mochitests as a make
> target in bug 725112, and an option to run until failure in bug 875463. I
> think those are the relevant precedents in other harnesses. As far as I can
> tell, each of these is intended for use by a developer attempting to
> diagnose a problem locally, and not as a part of running CI (the tbpl UI has
> a button to re-trigger a particular test job, this may be used in some cases
> to determine if a test failure is intermittent).

Does the TBPL retrigger cause all tests to rerun? We're talking about a 2hr+ test run in the case of the Gaia functional UI tests on a device, and devices are limited, so having the option to just rerun the failures would be ideal.

Perhaps the best solution would be to optionally write out a manifest containing the failed tests, which could be archived by the CI and downloaded for running locally, but not automatically rerun. This means we're not building failure tolerance into our test suites, and leaving that up to the observer of the results.
Flags: needinfo?(ahalberstadt)
No, I was just mentioning that the mochitest mach command had an option to re-run failures. On further inspection it looks like you have to run the command a second time immediately after a run that had failures in it.

So actually, it basically does what you suggested in your last comment.

http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/mach_commands.py#153
Changing the bug summary to reflect that we want an option to output a test manifest containing the failed tests.
Summary: Enhance marionette/gaia test runner to automatically run failed tests → Provide a command line option to output a manifest of test failures
I found myself wanting this feature today whilst investigating some failures on a release branch. Jonathan: Do you think this would be a candidate for a good first bug? I suspect we would need to somehow append the entire manifest entry for each failed test to the test runner object, and create a new manifest file from these entries. I think at the moment we're only preserving the test path after filtering.
Flags: needinfo?(jgriffin)
This might be a dupe of bug 883865.  Basically, it would be nice to have manifestparser have the ability to write a manifest comprising a set of tests from an input manifest.  In this case, we wouldn't have to track anything in Marionette; we could just pass manifestparser the original manifest plus a list of failed tests, and manifestparser would do the right thing.
Flags: needinfo?(jgriffin)
I think it's a little different. We already have the ability in manifestparser to write a manifest, but I think bug 883865 is about doing so in JSON format. I can successfully write the manifest out at the end of a testrun but the challenge is filtering out the successful and disabled tests. We maintain a list of failures, however the format of these isn't really usable to filter the manifest.

For example, the test names from the manifest would be:

> >>> [t['name'] for t in manifest.tests]
> ['unit/test_cold_launch.py', 'unit/test_permissions.py']

But the contents of self.failures is:

> >>> [f[0] for f in self.failures]
> ['test_permissions.py test_permissions.TestPermissions.test_get_and_set_permission']

I think we'd need a 1:1 mapping between the two to filter the manifest before writing it out to disk. Do you have any other thoughts on this Jonathan?
Flags: needinfo?(jgriffin)
(In reply to Dave Hunt (:davehunt) from comment #10)
> I think we'd need a 1:1 mapping between the two to filter the manifest
> before writing it out to disk. Do you have any other thoughts on this
> Jonathan?

I agree.  One approach would be to make Marionette track tests differently; instead of generating relatively arbitrary tuples for test failures (as we do now), we could create a Test object for each test it runs (ala moztest) and update the status at execution time; the Test object might contain the relative path to the test as an attribute, which could be used later when generating manifests.

Although I haven't looked at the patch for bug 956739 closely, I imagine it may impact how we go about this, so we may want to wait on that.
Flags: needinfo?(jgriffin)
Makes sense, blocking on bug 956739.
Depends on: 956739
[mass update] Setting Harness bugs to all P3
Priority: -- → P3
We haven’t gotten around to this in five years, so we probably never will.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.