Mochitests should continue after a browser crash

REOPENED
Unassigned

Status

defect
REOPENED
7 years ago
6 years ago

People

(Reporter: ekr, Unassigned)

Tracking

Trunk
x86
macOS
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [WebRTC][blocking-webrtc-])

Attachments

(1 attachment)

Currently WebRTC is not stable enough to run a large number of consecutive
tests in a given version of Firefox.

For now, each test should run in its own Firefox instance. Given the number of tests we have now, (< 50), this shouldn't be a big deal in terms of performance cost.
The mochitest harness is not set up to do this at all.
I'm not committed to mochitest. Do we have anything else that would work?
We don't currently have any test harness that opens a new browser instance for each test. The closest thing is xpcshell, where we run a new xpcshell instance per-test.
I'm sorry this is likely a dumb question, but why aren't we making the code stable enough to run properly? Do we know why it's unstable? Running each test in its own instance of Firefox in automation would be incredibly slow and would adversely impact our end to end times.

Xpcshell is the only test framework that runs this way, but it doesn't start the entire browser each time so it's fast(er).
Sure, we're working on that.

But in the meantime, however, we want to have automated tests that the functionality works, even if it leaks resources.
(In reply to Eric Rescorla from comment #5)
> Sure, we're working on that.

Can you please reference bug numbers? I kinda would like to see that tracked somewhere. Also I'm not sure on which results your initial comment is based on. Right now we only have a single Mochitest checked-in. Why do we know that we will get those failures?
Whiteboard: [WebRTC][blocking-webrtc-]
Recapping discussion in #media: 

I made the comment we would be fine with a hack

Ted indicated he might restart the tests with the next item after a crash; and this might be useful in m-c eventually.

Sounds good to me.
Resummarizing.
Summary: Run each WebRTC mochitest in its own firefox instance → Mochitests should continue after a browser crash
This patch works for mochitest-plain, mochitest-chrome and mochitest-browser-chrome in my limited local testing. It could stand a push to Try to make sure it hasn't horribly broken anything. In addition, I meant to try adding tests that intentionally crash and do a try run with those as well to see what it looks like.

This isn't landable on mozilla-central as-is, because it will confuse buildbot/tbpl, since at least buildbot uses the end-of-run summary of pass/fail counts, and those won't be correct here. I don't have a great solution for that at the moment. Assuming this doesn't break anything we might be able to land this on alder for testing purposes.
I pushed this patch to try given that we have multiple tests now which are likely going to crash the browser. Lets see how it looks like with the patch applied:

https://tbpl.mozilla.org/?tree=Try&rev=0738352fecb1
Mostly all bc and oth mochitests fail with this tryserver build. Also for the WebRTC test we don't bring up the harness correctly - at least what I can see when skimming over:

https://tbpl.mozilla.org/php/getParsedLog.php?id=18095530&tree=Try#error0

I will try to reproduce it locally on my machine by tomorrow if I have the time.
Some things to note here:

1. We do not correctly report the right pass/fail count if a test crashes the application:

1:11.84 77 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_peerConnection_basicVideo.html | For now simply disconnect. We will add checks for media in a follow-up bug
 1:11.87 78 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_peerConnection_basicVideo.html | We can't run any checks and clean-up code due to a crash (see bug 820072)
 1:12.00 79 INFO TEST-END | /tests/dom/media/tests/mochitest/test_peerConnection_basicVideo.html | finished in 6581ms
 1:12.19 80 INFO TEST-START | Shutdown
 1:12.19 81 INFO Passed: 54
 1:12.20 82 INFO Failed: 0
 1:12.20 83 INFO Todo:   6
 1:12.20 84 INFO SimpleTest FINISHED
 1:12.20 85 INFO TEST-INFO | Ran 0 Loops
 1:12.20 86 INFO SimpleTest FINISHED

2. Given bug 821880 I experience a shutdown hang which doesn't get aborted. So the framework cannot restart the application. It blocks me from fully testing this patch.
Now that we've stabilized a large amount of the WebRTC crashes, I don't see value in doing this anymore. I'd rather us fail fast rather than continue a test due to a crash.

WONTFIX on that basis.
No longer blocks: webrtc-tests
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
I think you are confusing two issues:

1. How soon we get a failure report.
2. Whether we run other tests after a fail happens.

I agree we should get a failure report in tbpl (or whatever) as soon as
a single test fails. That doesn't mean that I don't want to run all the
other tests as well to see what fail.

I won't just REOPEN it, but we should discuss.
(In reply to Eric Rescorla (:ekr) from comment #14)
> I think you are confusing two issues:
> 
> 1. How soon we get a failure report.
> 2. Whether we run other tests after a fail happens.
> 
> I agree we should get a failure report in tbpl (or whatever) as soon as
> a single test fails. That doesn't mean that I don't want to run all the
> other tests as well to see what fail.
> 
> I won't just REOPEN it, but we should discuss.

Ah, I understand now. If I did a local run right now with one mochitest that causes a crash, it takes the other mochitests in the current run down with it, right?

I'll move this over to testing --> mochitest then.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Component: WebRTC → Mochitest
Product: Core → Testing
QA Contact: jsmith
Version: unspecified → Trunk
Yeah, I think this still has value even if we don't need it immediately for WebRTC work anymore.
I agree with Ted in comment 16 but we should note that the current TBPL does not even *know* the state of the job during processing. Once we have TBPL version 2 in place, we will have the ability to know what is happening during a test run state, and we can do something like what Eric mentions in comment 14 where we alert as soon as one test crashes.

The other thing that we'll need here is some hardening of Mochitest between tests - right now even running tests in different orders will cause unexpected failures, so if we are really going to enable mochitest to survive crashes, then we need to ensure the rest of the results we get from the run are actually valid and can be trusted, and that's not a statement I'd make right now with the current state of unknown interdependencies between our tests in mochitest.
You need to log in before you can comment on or make changes to this bug.