Open Bug 1354042 Opened 3 years ago Updated 2 years ago

ensure pending crash dumps are removed between tests

Categories

(Testing :: General, enhancement, P3)

enhancement

Tracking

(firefox55 fixed)

REOPENED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: jmaher, Unassigned)

References

Details

Attachments

(3 files)

we suspect that crash artifacts are causing issues when left behind (outside of the firefox profile) and we start a new browser session with a fresh profile- it is likely suspect that we get a notification in the UI indicating there are crash reports to submit, etc.

I propose removing all related files
FWIW, I think the respective harnesses (that actually run the browser) should do this between running individual tests, so we can correctly attribute content crashes to the tests that caused them. In this case we should probably have a test manifest annotation that allows for N content-crashes for a given test, both to (a) deal with the probably existant crashes and (b) deal with tests that deliberately check what happens if crashing a content process.
I would agree the test harnesses themselves should clean up, not the browser.  If we assume a rm -Rf <dir> before a test run, would that be ok and allow any test that crashes on purpose or accident to properly cleanup?

I just want to make sure we have 100% accuracy when it comes to running the tests and assume that our crashreporting works as intended.
(In reply to Joel Maher ( :jmaher) from comment #2)
> I would agree the test harnesses themselves should clean up, not the
> browser.  If we assume a rm -Rf <dir> before a test run, would that be ok
> and allow any test that crashes on purpose or accident to properly cleanup?

I think so. But I'm concerned that we apparently have repetitive content crashes that we're not catching some other way. As in, for bug 1267480, I'd expect the content crashes in the fingerprinting tests to trigger orange themselves, not come to our attention because other tests that run later go a bit wonky because of UI we display to submit the crashes...
Great point
Are we talking about pending crash reports, as in bug 1274395?
yes, that sounds like the right thing; I wonder if we can do a ls on the [uappdir] and output the data to a log file- then push to try and see what shows up?
I checked several recent logs for bug 1267480 (browser_security.js) failures. runtests.py is started with --cleanup-crashes and does not report deleting any pending crash reports. If that mechanism is working, that means any problematic pending crash reports are being created during that runtests.py run.

It is perhaps an interesting coincidence that browser/modules/test/browser/browser_UnsubmittedCrashHandler.js creates pending crash reports, in a "fake" UAppDir, and that typically runs in the same job as browser_security.js...but browser_UnsubmittedCrashHandler.js runs *after* browser_security.js ... so I can't imagine it is related.
Comment 9 shows that several mochitest-bc tests leave pending crash reports behind:

browser/base/content/test/plugins/browser_CTP_crashreporting.js
browser/base/content/test/tabcrashed/browser_clearEmail.js
browser/modules/test/browser/browser_UnsubmittedCrashHandler.js
toolkit/components/perfmonitoring/tests/browser/browser_compartments.js
toolkit/mozapps/extensions/test/xpinstall/browser_amosigned_trigger.js

I think the harness should delete pending crash reports after every test, and report that it is doing so. As in bug 1274395, that should only be done when the --cleanup-crashes flag is specified, to avoid cleaning up developers' local UAppData files. 

We might also fail any test that leaves pending crash reports, but that would require some investigation into the existing cases (remember, I've only looked at mochitest-bc on linux so far!).
Assignee: nobody → gbrown
adding :jgraham and :whimboo so they are aware this is a problem and any harnesses that they work on should ensure crashes from previous tests and browser sessions are cleaned up in all places outside of the firefox profile.
Blocks: 1353894
Summary: ensure all crash artifacts are removed between browser sessions → ensure pending crash dumps are removed between tests
I'll create separate patches for each test harness. 

I'm starting with reftest since that seems the easiest! I've run this locally with and without --cleanup-crashes and pushed to try with logging. Final check running at https://treeherder.mozilla.org/#/jobs?repo=try&revision=d1b4a12495c57416f25c4846c6e652f212d94a82.

I have considered failing the test if it leaves pending crash reports behind, but I am not sure that is appropriate, and that might introduce multiple new failures, delaying deployment of this fix. Let's follow up in a separate bug if that behavior is desired. For now, deleting pending crash reports should shelter later tests from any ill effects.
Attachment #8855908 - Flags: review?(jmaher)
Comment on attachment 8855908 [details] [diff] [review]
delete pending crash reports between reftests

Review of attachment 8855908 [details] [diff] [review]:
-----------------------------------------------------------------

do we need to ensure --cleanup-crashes is passed into the reftest/crashtest/jsreftest harnesses via mozharness?

::: layout/tools/reftest/reftest.jsm
@@ +1839,5 @@
> +    while (entries.hasMoreElements()) {
> +        let file = entries.getNext().QueryInterface(CI.nsIFile);
> +        if (file.isFile()) {
> +          file.remove(false);
> +          logger.info("This test left pending crash dumps; deleted "+file.path);

should we make this something more severe so the test fails?
Attachment #8855908 - Flags: review?(jmaher) → review+
(In reply to Joel Maher ( :jmaher) from comment #13)
> do we need to ensure --cleanup-crashes is passed into the
> reftest/crashtest/jsreftest harnesses via mozharness?

That was added in bug 1274395. Recent logs verify it is still in use.

> > +          logger.info("This test left pending crash dumps; deleted "+file.path);
> 
> should we make this something more severe so the test fails?

I don't *think* so. If the crash dumps are cleaned up, no harm done to the rest of the run. On the other hand, it *might* indicate an unexpected condition in the test that caused the pending crash report. I feel like this is scope creep though. If someone wants to detect pending crash reports, lets open a separate bug.
yes, I agree- if we want to fail on pending crashreports per test, then lets file a bug and discuss the details there- this right now gets us to more reliable tests
Same code here, now for browser mochitests. This one is important since we know there are several such tests which leave pending crash reports, and those crash reports interfere with later tests.

This seems like a good place to clean up, but there may be a better choice. I think I'll need to handle plain mochitests separately...does that seem right?
Attachment #8855929 - Flags: review?(jmaher)
Comment on attachment 8855929 [details] [diff] [review]
delete pending crash reports between browser-flavor mochitests

Review of attachment 8855929 [details] [diff] [review]:
-----------------------------------------------------------------

is there any way we can have a unittest for this?  maybe a simple test that adds a file there and we can verify on cleanup that we get a message?  I don't think it would be trivial, but it would be nice to have.
Attachment #8855929 - Flags: review?(jmaher) → review+
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c101d83ac3ed
Delete pending crash reports between browser mochitest tests; r=jmaher
https://hg.mozilla.org/integration/mozilla-inbound/rev/7a19e5e6bce3
Delete pending crash reports between reftests; r=jmaher
More test suites to come...
Keywords: leave-open
Attachment #8856056 - Flags: review?(jmaher) → review+
one other thought I had is that why when we moved test dirs to different chunks did tests in an unrelated directory start failing?  Right now we are under the premise that they were failing because of a pending crash- but if the same set of tests ran in the same browser session, then we would already have failures before shifting directories between chunks.

Possibly there are other reasons why we had failures related to bug 1353894?
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3c60791c6567
Delete pending crash reports between mochitests; r=jmaher
I doubt there is any point in deleting pending crash reports between non-browser tests (xpcshell, gtest, cppunit). Hopefully this is all that is needed.
Keywords: leave-open
(In reply to Joel Maher ( :jmaher) from comment #21)
> one other thought I had is that why when we moved test dirs to different
> chunks did tests in an unrelated directory start failing?  Right now we are
> under the premise that they were failing because of a pending crash- but if
> the same set of tests ran in the same browser session, then we would already
> have failures before shifting directories between chunks.
> 
> Possibly there are other reasons why we had failures related to bug 1353894?

Possibly, but I think this is a likely explanation.

I suspect that lots of tests are unaffected by the presence of pending crash reports. Only the combination of a pending-crash-report-sensitive test running after a test that leaves pending crash reports creates a problem. When that combination is first introduced within a single directory of tests, that should produce a failure that is easy to fix by backout; when that combination comes about from a reordering of tests, it's hard to assign blame and the problem persists.

Or maybe there's another factor...!
I don't entirely understand the issue here, but it seems pretty suspect that this would be required for mochitest-plain and reftests but not web-platform-tests. Also what about marionette tests (and related things like Firefox UI tests)?
https://hg.mozilla.org/mozilla-central/rev/3c60791c6567
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
(In reply to James Graham [:jgraham] from comment #26)
> I don't entirely understand the issue here, but it seems pretty suspect that
> this would be required for mochitest-plain and reftests but not
> web-platform-tests. Also what about marionette tests (and related things
> like Firefox UI tests)?

Marionette related tests which are run by a binary through Marionette are using mozcrash to detect crashes. AFAIR mozcrash itself moves them away, but only if MINIDUMP_SAVE_PATH has been specified. So not sure which harnesses are actually doing that. Ted might now.

https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozcrash/mozcrash/mozcrash.py#307

Marionette:
https://dxr.mozilla.org/mozilla-central/rev/45692c884fdd5136a64fb2f8a61a0c8183b69331/testing/marionette/client/marionette_driver/marionette.py#779
https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozrunner/mozrunner/base/runner.py#196

I would suggest that we make sure to have MINIDUMP_SAVE_PATH set for ANY of our test jobs in TC and Buildbot.
Flags: needinfo?(ted)
I think there is pending work here for marionette and wpt harnesses.

:gbrown, per my earlier comment 21, I still believe that when we shifted directories to different chunks, we had failures in a directory that didn't have any failures.  Maybe our test is to roll back to specific revisions, apply your fix and push to try.

The case was that in bug 1267480, we saw failures in browser_security:
browser/components/preferences/in-content/tests/browser_security.js

we landed a fix to split it up and that moved it to a different chunk where we saw permanent failures in:
browser/components/resistfingerprinting/test/browser/browser_roundedWindow_newWindow.js


a link to treeherder of the original landing and failures:
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=28f8a3519d733370c1f6a848a243f05ade19663d&selectedJob=88750338


In this case, could we update to 28f8a3519d73, apply the fix for mochitest-browser-chrome and push to try with:
|mach try -b o -p macosx64 -u mochitest-e10s-bc -t none|

that would prove that we have solved this case- or maybe enlighten us to more work to do.

It could be I am missing the obvious :)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Henrik Skupin (:whimboo) from comment #28)
> (In reply to James Graham [:jgraham] from comment #26)
> > I don't entirely understand the issue here, but it seems pretty suspect that
> > this would be required for mochitest-plain and reftests but not
> > web-platform-tests. Also what about marionette tests (and related things
> > like Firefox UI tests)?
> 
> Marionette related tests which are run by a binary through Marionette are
> using mozcrash to detect crashes. AFAIR mozcrash itself moves them away, but
> only if MINIDUMP_SAVE_PATH has been specified. So not sure which harnesses
> are actually doing that. Ted might now.
> 
> https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozcrash/
> mozcrash/mozcrash.py#307
> 
> Marionette:
> https://dxr.mozilla.org/mozilla-central/rev/
> 45692c884fdd5136a64fb2f8a61a0c8183b69331/testing/marionette/client/
> marionette_driver/marionette.py#779
> https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozrunner/
> mozrunner/base/runner.py#196
> 
> I would suggest that we make sure to have MINIDUMP_SAVE_PATH set for ANY of
> our test jobs in TC and Buildbot.

MINIDUMP_SAVE_PATH is being set.

mozcrash generally deals with "regular" minidumps, but not "pending crash reports"; the exception is https://dxr.mozilla.org/mozilla-central/rev/45692c884fdd5136a64fb2f8a61a0c8183b69331/testing/mozbase/mozcrash/mozcrash/mozcrash.py#515, which deletes pending crash reports, but is only called at the start of a test run.

The problem with pending crash reports is that, if present, the browser may pop up a UI suggesting that the pending crash reports be submitted. Thus, one test can create a pending crash report, and a future test can be affected by the UI for the pending crash report.

I am not entirely clear on the conditions necessary for the creation of a pending crash report. I have only found a handful of cases, all in mochitest-bc, most in crash related tests (comment 10). I am also not entirely clear on when the UI comes up, and of course, many tests can run unaffected even in the presence of such a UI.

The patches I have landed in this bug check at the end of each mochitest or reftest and delete any pending crash reports found.

I still don't know if similar changes are required for marionette or wpt, or where such changes would go.
Flags: needinfo?(ted)
(In reply to Joel Maher ( :jmaher) from comment #29)
> The case was that in bug 1267480, we saw failures in browser_security:
> browser/components/preferences/in-content/tests/browser_security.js
> 
> we landed a fix to split it up and that moved it to a different chunk where
> we saw permanent failures in:
> browser/components/resistfingerprinting/test/browser/
> browser_roundedWindow_newWindow.js
> 
> 
> a link to treeherder of the original landing and failures:
> https://treeherder.mozilla.org/#/
> jobs?repo=autoland&revision=28f8a3519d733370c1f6a848a243f05ade19663d&selected
> Job=88750338

In that job, the failing test, browser_roundedWindow_newWindow.js runs after browser/base/content/test/tabcrashed/browser_clearEmail.js, which is known (comment 10) to leave pending crash reports.

In the previous push, browser_roundedWindow_newWindow.js ran in bc2, https://treeherder.mozilla.org/#/jobs?repo=autoland&tochange=28f8a3519d733370c1f6a848a243f05ade19663d&fromchange=2159959522f4180bdca483162c9134beb108575a&filter-searchStr=osx%20opt%20bc%20e10s&selectedJob=88706974. browser_clearEmail.js did not run in that job.

See also my comment 25, which is regrettably too generic.
thanks Geoff, I didn't want to overlook anything and I think this addresses my primary concern!
Don't we generally turn off the crash reporter for our tests as run in automation?
(In reply to Henrik Skupin (:whimboo) from comment #33)
> Don't we generally turn off the crash reporter for our tests as run in
> automation?

Yes, we usually set MOZ_CRASHREPORTER_NO_REPORT=1 in the environment, for example. It seems to not affect pending crash reports; maybe it should?
Well, turning off the crash reporter would mean that we do not generate and store crash reports. I wonder what's actually creating those pending crash reports then.

Ted, could you give us some clarification please?
Flags: needinfo?(ted)
See Also: → 1269998
(In reply to Henrik Skupin (:whimboo) from comment #35)
> Well, turning off the crash reporter would mean that we do not generate and
> store crash reports. I wonder what's actually creating those pending crash
> reports then.
> 
> Ted, could you give us some clarification please?

We should be running all test harnesses with `MOZ_CRASHREPORTER_NO_REPORT=1`, which *should* make it so that no crashes ever get put into the pending directory. However, it looks like there are tests that want to test content process crash submission, and to do so they unset that environment variable:
https://dxr.mozilla.org/mozilla-central/rev/8b854986038cf3f3f240697e27ef48ea65914c13/browser/base/content/test/tabcrashed/browser_clearEmail.js#12

It might be good to figure out a better API for tests to use here--perhaps we could have a one-time-use pref or other internal setting that these tests could use? The code that currently checks `MOZ_CRASHREPORTER_NO_REPORT` could then check that pref as well, and if set submit just the one crash report and then reset the pref. It's a little hairy right now because all of that happens inside nsExceptionHandler.cpp (search for callers of `ShouldReport`):
https://dxr.mozilla.org/mozilla-central/rev/8b854986038cf3f3f240697e27ef48ea65914c13/toolkit/crashreporter/nsExceptionHandler.cpp#1526

Without fixing that I think gbrown's patch is the right approach--the harness checks for minidumps after each test run and cleans them up.

One other thing we could do to make this less prone to error would be to add a way to override where pending dumps go when we do use this code path. Currently everything is set to use `UAppData` from the directory service, but we could easily add another env var or pref to set a different path to use for testing, which would at least mean that we wouldn't accidentally accumulate minidumps outside of the temporary directories we create for testing.
Flags: needinfo?(ted)
Comment 36 notes interesting possibilities for improved handling of this issue - thanks! - but without any known related failures, I'm not finding the time to follow up. 

Leaving this bug open in case new problems arise or others are more motivated than I.
Assignee: gbrown → nobody
Whiteboard: [fingerprinting]
Whiteboard: [fingerprinting]
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.