Make the mochitest harness wait for crashes to be recorded before deleting them

RESOLVED FIXED in Firefox 57

Status

()

Toolkit
Crash Reporting
RESOLVED FIXED
9 months ago
6 months ago

People

(Reporter: gsvelto, Assigned: gsvelto)

Tracking

(Blocks: 2 bugs)

unspecified
mozilla58
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox57 fixed, firefox58 fixed)

Details

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

(Assignee)

Description

9 months ago
+++ This bug was initially created as a clone of Bug #1323979 +++

Some of the mochitests that deliberately cause crashes delete the crash dump files themselves and some defer the deletion to the harness. Either way both tests usually race either against the CrashService, CrashManager or the CrashSubmit object which all manipulate the dump and extra files asynchronously. 

Some tests were manually fixed in bug 1319702 but there's more that have different type of races (some might even "leak" the .extra file into the following test) so the only proper solution is to have the harness deal with the files after all other consumers are done.
(Assignee)

Updated

9 months ago
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Comment hidden (mozreview-request)
(Assignee)

Comment 3

8 months ago
Comment on attachment 8903084 [details]
Bug 1393800 - Have mochitests expecting crashes wait for the crashes to be recorded before clean up; .mielczarek

Redirecting review since Ted is on PTO and this happens almost exclusively outside of the core crash reporting code. This isn't urgent though so no pressure, there's a detailed description of the changes in the commit message.
Attachment #8903084 - Flags: review?(ted) → review?(mconley)
Comment on attachment 8903084 [details]
Bug 1393800 - Have mochitests expecting crashes wait for the crashes to be recorded before clean up; .mielczarek

https://reviewboard.mozilla.org/r/174816/#review191186

WFM! Thanks for this clean-up.

::: dom/ipc/tests/process_error.xul:34
(Diff revision 1)
> -      waitCrash.then(done);
> +      done();
>      }
>  
>      Services.obs.addObserver(crashObserver, 'ipc:content-shutdown');
>  
> -    document.getElementById('thebrowser')
> +    BrowserTestUtils.crashBrowser(document.getElementById('thebrowser'));

Nice de-duplication!

::: testing/specialpowers/content/SpecialPowersObserverAPI.js:401
(Diff revision 1)
>          }
>          return undefined; // See comment at the beginning of this function.
>        }
>  
> +      case "SPProcessCrashManagerWait": {
> +        let promises = aMessage.json.crashIds.map((crashId) => {

You should be able to refer to aMessage.data instead of aMessage.json. I believe .json is deprecated.

::: testing/specialpowers/content/specialpowersAPI.js:685
(Diff revision 1)
> +      let self = this;
> +      function messageListener(msg) {
> +        self._removeMessageListener("SPProcessCrashManagerWait", messageListener);
> +        resolve();
> +      }

You can avoid the `self` hack with a fat arrow:

```js
let messageListener = msg => {
  this._removeMessageListener("SPProcessCrashManagerWait", messageListener);
  resolve();
};

this._addMessageListener("SPProcessCrashManagerWait", messageListener);
```
Attachment #8903084 - Flags: review?(mconley) → review+
(Assignee)

Comment 5

8 months ago
While addressing your review comments I found that one of the clipboard mochitests is consistently failing in the ccov builds. I'm not sure why that is but I'll investigate before landing this. I haven't touched that particular test nor the dependencies it has  so I might have triggered an existing race I hadn't spotted before.
(Assignee)

Updated

8 months ago
Depends on: 227246
(Assignee)

Comment 6

7 months ago
Changing the bug dependency since bug 1406971 makes nsIProcess not rely on PR_CreateProcess() anymore and thus solves the issue I was having here.
Depends on: 1406971
No longer depends on: 227246
(Assignee)

Comment 7

7 months ago
I hoped to be able to land this today but I've just run into some brand new orange in the tests:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4bbec042d5a6d0368e29e4f27f5b97ec8af8a166&group_state=expanded&selectedJob=136829548

It's a Windows failure and I don't have Windows build ready so I'll have to investigate it on Monday.
(Assignee)

Comment 8

7 months ago
After a lot of try runs I've verified that what I'm seeing is yet another genuine issue, here's another repro:

https://treeherder.mozilla.org/logviewer.html#?job_id=137449081&repo=try&lineNumber=2720

I'll investigate it and follow up, in the meantime I can't land the patch :(
(Assignee)

Comment 9

7 months ago
I've noticed a regular pattern in the intermittent failure I'm seeing. Every time it shows up in a run it's preceded by the following line:

15:35:38     INFO -  JavaScript error: resource://gre/modules/FormHistory.jsm, line 383: NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIProperties.get]

Due to the high number of similar errors we often spew out when running mochitests I hadn't paid attention to it but while I couldn't find a direct link between the two the correlation is very strong: I've got a half dozen runs that succeed and which don't have that message and a half dozen that fail and have it in exactly the same place.

I've dug into the stacks and what is happening is that the FormHistory is trying to purge old entries in response to the 'idle-daily' event. The database connection fails and in turn causes the error to show up. It's unclear to me if this is what's causing the test to hang but what's weirder still is that I don't see why that particular observer is fired during the tests.

I've prepared a new run which disables the 'idle-daily' notification entirely and I'm now waiting to see if it fixes my problem:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a0a06e85d35e4c06f2775678973686273b6969e7
(Assignee)

Comment 10

7 months ago
Correction, the failure is not caused by the database connection, it's caused by the directory service returning an error. Since this happens in the teardown phase of a test I have the feeling that the profile directory has already been removed at that point.
(Assignee)

Updated

7 months ago
Depends on: 1409769

Comment 11

7 months ago
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b9a113790a15
Have mochitests expecting crashes wait for the crashes to be recorded before clean up; r=mconley
https://hg.mozilla.org/mozilla-central/rev/b9a113790a15
Status: ASSIGNED → RESOLVED
Last Resolved: 7 months ago
status-firefox58: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla58

Comment 13

7 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/4005ce92f03e
status-firefox57: --- → fixed
Flags: in-testsuite+

Updated

6 months ago
Blocks: 1416078
You need to log in before you can comment on or make changes to this bug.