Open Bug 1354042 Opened 8 years ago Updated 3 years ago

ensure pending crash dumps are removed between tests

Categories

(Testing :: General, enhancement, P3)

Product:

Component:

Type:

enhancement

Priority:

P3

Severity:

S3

Tracking

(firefox55 fixed)

Status:

REOPENED

Milestone:

mozilla55

Tracking Flags:

Tracking

Status

firefox55

---

fixed

People

(Reporter: jmaher, Unassigned)

References

Details

Attachments

(3 files)

delete pending crash reports between reftests 8 years ago Geoff Brown [:gbrown] 3.40 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review
delete pending crash reports between browser-flavor mochitests 8 years ago Geoff Brown [:gbrown] 1.28 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review
delete pending crash reports between regular and chrome mochitests 8 years ago Geoff Brown [:gbrown] 4.66 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Description

•

8 years ago

we suspect that crash artifacts are causing issues when left behind (outside of the firefox profile) and we start a new browser session with a fresh profile- it is likely suspect that we get a notification in the UI indicating there are crash reports to submit, etc. I propose removing all related files

Comment 1

•

8 years ago

FWIW, I think the respective harnesses (that actually run the browser) should do this between running individual tests, so we can correctly attribute content crashes to the tests that caused them. In this case we should probably have a test manifest annotation that allows for N content-crashes for a given test, both to (a) deal with the probably existant crashes and (b) deal with tests that deliberately check what happens if crashing a content process.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 2

•

8 years ago

I would agree the test harnesses themselves should clean up, not the browser. If we assume a rm -Rf <dir> before a test run, would that be ok and allow any test that crashes on purpose or accident to properly cleanup? I just want to make sure we have 100% accuracy when it comes to running the tests and assume that our crashreporting works as intended.

Comment 3

•

8 years ago

(In reply to Joel Maher ( :jmaher) from comment #2) > I would agree the test harnesses themselves should clean up, not the > browser. If we assume a rm -Rf <dir> before a test run, would that be ok > and allow any test that crashes on purpose or accident to properly cleanup? I think so. But I'm concerned that we apparently have repetitive content crashes that we're not catching some other way. As in, for bug 1267480, I'd expect the content crashes in the fingerprinting tests to trigger orange themselves, not come to our attention because other tests that run later go a bit wonky because of UI we display to submit the crashes...

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 4

•

8 years ago

Great point

Geoff Brown [:gbrown]

Comment 5

•

8 years ago

Are we talking about pending crash reports, as in bug 1274395?

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 6

•

8 years ago

yes, that sounds like the right thing; I wonder if we can do a ls on the [uappdir] and output the data to a log file- then push to try and see what shows up?

Geoff Brown [:gbrown]

Comment 7

•

8 years ago

I checked several recent logs for bug 1267480 (browser_security.js) failures. runtests.py is started with --cleanup-crashes and does not report deleting any pending crash reports. If that mechanism is working, that means any problematic pending crash reports are being created during that runtests.py run. It is perhaps an interesting coincidence that browser/modules/test/browser/browser_UnsubmittedCrashHandler.js creates pending crash reports, in a "fake" UAppDir, and that typically runs in the same job as browser_security.js...but browser_UnsubmittedCrashHandler.js runs *after* browser_security.js ... so I can't imagine it is related.

Geoff Brown [:gbrown]

Comment 8

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=64fdc7f4d8f52cb4e37532bdf3b2ffc7c6040e7e

Geoff Brown [:gbrown]

Comment 9

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=31f2c2bcceac4e77a2366a27c4763ccc611205c8

Geoff Brown [:gbrown]

Comment 10

•

8 years ago

Comment 9 shows that several mochitest-bc tests leave pending crash reports behind: browser/base/content/test/plugins/browser_CTP_crashreporting.js browser/base/content/test/tabcrashed/browser_clearEmail.js browser/modules/test/browser/browser_UnsubmittedCrashHandler.js toolkit/components/perfmonitoring/tests/browser/browser_compartments.js toolkit/mozapps/extensions/test/xpinstall/browser_amosigned_trigger.js I think the harness should delete pending crash reports after every test, and report that it is doing so. As in bug 1274395, that should only be done when the --cleanup-crashes flag is specified, to avoid cleaning up developers' local UAppData files. We might also fail any test that leaves pending crash reports, but that would require some investigation into the existing cases (remember, I've only looked at mochitest-bc on linux so far!).

Assignee: nobody → gbrown

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 11

•

8 years ago

adding :jgraham and :whimboo so they are aware this is a problem and any harnesses that they work on should ensure crashes from previous tests and browser sessions are cleaned up in all places outside of the firefox profile.

Geoff Brown [:gbrown]

Updated

•

8 years ago

Blocks: 1353894

Geoff Brown [:gbrown]

Updated

•

8 years ago

Summary: ensure all crash artifacts are removed between browser sessions → ensure pending crash dumps are removed between tests

Geoff Brown [:gbrown]

Comment 12

•

8 years ago

Attached patch delete pending crash reports between reftests — Details — Splinter Review

I'll create separate patches for each test harness. I'm starting with reftest since that seems the easiest! I've run this locally with and without --cleanup-crashes and pushed to try with logging. Final check running at https://treeherder.mozilla.org/#/jobs?repo=try&revision=d1b4a12495c57416f25c4846c6e652f212d94a82. I have considered failing the test if it leaves pending crash reports behind, but I am not sure that is appropriate, and that might introduce multiple new failures, delaying deployment of this fix. Let's follow up in a separate bug if that behavior is desired. For now, deleting pending crash reports should shelter later tests from any ill effects.

Attachment #8855908 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 13

•

8 years ago

Comment on attachment 8855908 [details] [diff] [review] delete pending crash reports between reftests Review of attachment 8855908 [details] [diff] [review]: ----------------------------------------------------------------- do we need to ensure --cleanup-crashes is passed into the reftest/crashtest/jsreftest harnesses via mozharness? ::: layout/tools/reftest/reftest.jsm @@ +1839,5 @@ > + while (entries.hasMoreElements()) { > + let file = entries.getNext().QueryInterface(CI.nsIFile); > + if (file.isFile()) { > + file.remove(false); > + logger.info("This test left pending crash dumps; deleted "+file.path); should we make this something more severe so the test fails?

Attachment #8855908 - Flags: review?(jmaher) → review+

Geoff Brown [:gbrown]

Comment 14

•

8 years ago

(In reply to Joel Maher ( :jmaher) from comment #13) > do we need to ensure --cleanup-crashes is passed into the > reftest/crashtest/jsreftest harnesses via mozharness? That was added in bug 1274395. Recent logs verify it is still in use. > > + logger.info("This test left pending crash dumps; deleted "+file.path); > > should we make this something more severe so the test fails? I don't *think* so. If the crash dumps are cleaned up, no harm done to the rest of the run. On the other hand, it *might* indicate an unexpected condition in the test that caused the pending crash report. I feel like this is scope creep though. If someone wants to detect pending crash reports, lets open a separate bug.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 15

•

8 years ago

yes, I agree- if we want to fail on pending crashreports per test, then lets file a bug and discuss the details there- this right now gets us to more reliable tests

Geoff Brown [:gbrown]

Comment 16

•

8 years ago

Attached patch delete pending crash reports between browser-flavor mochitests — Details — Splinter Review

Same code here, now for browser mochitests. This one is important since we know there are several such tests which leave pending crash reports, and those crash reports interfere with later tests. This seems like a good place to clean up, but there may be a better choice. I think I'll need to handle plain mochitests separately...does that seem right?

Attachment #8855929 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 17

•

8 years ago

Comment on attachment 8855929 [details] [diff] [review] delete pending crash reports between browser-flavor mochitests Review of attachment 8855929 [details] [diff] [review]: ----------------------------------------------------------------- is there any way we can have a unittest for this? maybe a simple test that adds a file there and we can verify on cleanup that we get a message? I don't think it would be trivial, but it would be nice to have.

Attachment #8855929 - Flags: review?(jmaher) → review+

Comment 18

•

8 years ago

Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c101d83ac3ed Delete pending crash reports between browser mochitest tests; r=jmaher https://hg.mozilla.org/integration/mozilla-inbound/rev/7a19e5e6bce3 Delete pending crash reports between reftests; r=jmaher

Geoff Brown [:gbrown]

Comment 19

•

8 years ago

More test suites to come...

Keywords: leave-open

Geoff Brown [:gbrown]

Comment 20

•

8 years ago

Attached patch delete pending crash reports between regular and chrome mochitests — Details — Splinter Review

https://treeherder.mozilla.org/#/jobs?repo=try&revision=62fd57a1744701ca942fae4f1f706795aa4e4e89

Attachment #8856056 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Updated

•

8 years ago

Attachment #8856056 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 21

•

8 years ago

one other thought I had is that why when we moved test dirs to different chunks did tests in an unrelated directory start failing? Right now we are under the premise that they were failing because of a pending crash- but if the same set of tests ran in the same browser session, then we would already have failures before shifting directories between chunks. Possibly there are other reasons why we had failures related to bug 1353894?

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 22

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/c101d83ac3ed https://hg.mozilla.org/mozilla-central/rev/7a19e5e6bce3

Comment 23

•

8 years ago

Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/3c60791c6567 Delete pending crash reports between mochitests; r=jmaher

Geoff Brown [:gbrown]

Comment 24

•

8 years ago

I doubt there is any point in deleting pending crash reports between non-browser tests (xpcshell, gtest, cppunit). Hopefully this is all that is needed.

Keywords: leave-open

Geoff Brown [:gbrown]

Comment 25

•

8 years ago

(In reply to Joel Maher ( :jmaher) from comment #21) > one other thought I had is that why when we moved test dirs to different > chunks did tests in an unrelated directory start failing? Right now we are > under the premise that they were failing because of a pending crash- but if > the same set of tests ran in the same browser session, then we would already > have failures before shifting directories between chunks. > > Possibly there are other reasons why we had failures related to bug 1353894? Possibly, but I think this is a likely explanation. I suspect that lots of tests are unaffected by the presence of pending crash reports. Only the combination of a pending-crash-report-sensitive test running after a test that leaves pending crash reports creates a problem. When that combination is first introduced within a single directory of tests, that should produce a failure that is easy to fix by backout; when that combination comes about from a reordering of tests, it's hard to assign blame and the problem persists. Or maybe there's another factor...!

James Graham [:jgraham]

Comment 26

•

8 years ago

I don't entirely understand the issue here, but it seems pretty suspect that this would be required for mochitest-plain and reftests but not web-platform-tests. Also what about marionette tests (and related things like Firefox UI tests)?

Carsten Book [:Tomcat]

Comment 27

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/3c60791c6567

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox55: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla55

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 28

•

8 years ago

(In reply to James Graham [:jgraham] from comment #26) > I don't entirely understand the issue here, but it seems pretty suspect that > this would be required for mochitest-plain and reftests but not > web-platform-tests. Also what about marionette tests (and related things > like Firefox UI tests)? Marionette related tests which are run by a binary through Marionette are using mozcrash to detect crashes. AFAIR mozcrash itself moves them away, but only if MINIDUMP_SAVE_PATH has been specified. So not sure which harnesses are actually doing that. Ted might now. https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozcrash/mozcrash/mozcrash.py#307 Marionette: https://dxr.mozilla.org/mozilla-central/rev/45692c884fdd5136a64fb2f8a61a0c8183b69331/testing/marionette/client/marionette_driver/marionette.py#779 https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozrunner/mozrunner/base/runner.py#196 I would suggest that we make sure to have MINIDUMP_SAVE_PATH set for ANY of our test jobs in TC and Buildbot.

Flags: needinfo?(ted)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 29

•

8 years ago

I think there is pending work here for marionette and wpt harnesses. :gbrown, per my earlier comment 21, I still believe that when we shifted directories to different chunks, we had failures in a directory that didn't have any failures. Maybe our test is to roll back to specific revisions, apply your fix and push to try. The case was that in bug 1267480, we saw failures in browser_security: browser/components/preferences/in-content/tests/browser_security.js we landed a fix to split it up and that moved it to a different chunk where we saw permanent failures in: browser/components/resistfingerprinting/test/browser/browser_roundedWindow_newWindow.js a link to treeherder of the original landing and failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=28f8a3519d733370c1f6a848a243f05ade19663d&selectedJob=88750338 In this case, could we update to 28f8a3519d73, apply the fix for mochitest-browser-chrome and push to try with: |mach try -b o -p macosx64 -u mochitest-e10s-bc -t none| that would prove that we have solved this case- or maybe enlighten us to more work to do. It could be I am missing the obvious :)

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Geoff Brown [:gbrown]

Comment 30

•

8 years ago

(In reply to Henrik Skupin (:whimboo) from comment #28) > (In reply to James Graham [:jgraham] from comment #26) > > I don't entirely understand the issue here, but it seems pretty suspect that > > this would be required for mochitest-plain and reftests but not > > web-platform-tests. Also what about marionette tests (and related things > > like Firefox UI tests)? > > Marionette related tests which are run by a binary through Marionette are > using mozcrash to detect crashes. AFAIR mozcrash itself moves them away, but > only if MINIDUMP_SAVE_PATH has been specified. So not sure which harnesses > are actually doing that. Ted might now. > > https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozcrash/ > mozcrash/mozcrash.py#307 > > Marionette: > https://dxr.mozilla.org/mozilla-central/rev/ > 45692c884fdd5136a64fb2f8a61a0c8183b69331/testing/marionette/client/ > marionette_driver/marionette.py#779 > https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozrunner/ > mozrunner/base/runner.py#196 > > I would suggest that we make sure to have MINIDUMP_SAVE_PATH set for ANY of > our test jobs in TC and Buildbot. MINIDUMP_SAVE_PATH is being set. mozcrash generally deals with "regular" minidumps, but not "pending crash reports"; the exception is https://dxr.mozilla.org/mozilla-central/rev/45692c884fdd5136a64fb2f8a61a0c8183b69331/testing/mozbase/mozcrash/mozcrash/mozcrash.py#515, which deletes pending crash reports, but is only called at the start of a test run. The problem with pending crash reports is that, if present, the browser may pop up a UI suggesting that the pending crash reports be submitted. Thus, one test can create a pending crash report, and a future test can be affected by the UI for the pending crash report. I am not entirely clear on the conditions necessary for the creation of a pending crash report. I have only found a handful of cases, all in mochitest-bc, most in crash related tests (comment 10). I am also not entirely clear on when the UI comes up, and of course, many tests can run unaffected even in the presence of such a UI. The patches I have landed in this bug check at the end of each mochitest or reftest and delete any pending crash reports found. I still don't know if similar changes are required for marionette or wpt, or where such changes would go.

Flags: needinfo?(ted)

Geoff Brown [:gbrown]

Comment 31

•

8 years ago

(In reply to Joel Maher ( :jmaher) from comment #29) > The case was that in bug 1267480, we saw failures in browser_security: > browser/components/preferences/in-content/tests/browser_security.js > > we landed a fix to split it up and that moved it to a different chunk where > we saw permanent failures in: > browser/components/resistfingerprinting/test/browser/ > browser_roundedWindow_newWindow.js > > > a link to treeherder of the original landing and failures: > https://treeherder.mozilla.org/#/ > jobs?repo=autoland&revision=28f8a3519d733370c1f6a848a243f05ade19663d&selected > Job=88750338 In that job, the failing test, browser_roundedWindow_newWindow.js runs after browser/base/content/test/tabcrashed/browser_clearEmail.js, which is known (comment 10) to leave pending crash reports. In the previous push, browser_roundedWindow_newWindow.js ran in bc2, https://treeherder.mozilla.org/#/jobs?repo=autoland&tochange=28f8a3519d733370c1f6a848a243f05ade19663d&fromchange=2159959522f4180bdca483162c9134beb108575a&filter-searchStr=osx%20opt%20bc%20e10s&selectedJob=88706974. browser_clearEmail.js did not run in that job. See also my comment 25, which is regrettably too generic.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 32

•

8 years ago

thanks Geoff, I didn't want to overlook anything and I think this addresses my primary concern!

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 33

•

8 years ago

Don't we generally turn off the crash reporter for our tests as run in automation?

Geoff Brown [:gbrown]

Comment 34

•

8 years ago

(In reply to Henrik Skupin (:whimboo) from comment #33) > Don't we generally turn off the crash reporter for our tests as run in > automation? Yes, we usually set MOZ_CRASHREPORTER_NO_REPORT=1 in the environment, for example. It seems to not affect pending crash reports; maybe it should?

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 35

•

8 years ago

Well, turning off the crash reporter would mean that we do not generate and store crash reports. I wonder what's actually creating those pending crash reports then. Ted, could you give us some clarification please?

Flags: needinfo?(ted)

Geoff Brown [:gbrown]

Updated

•

8 years ago

See Also: → 1269998

(not currently active) Ted Mielczarek

Comment 36

•

8 years ago

(In reply to Henrik Skupin (:whimboo) from comment #35) > Well, turning off the crash reporter would mean that we do not generate and > store crash reports. I wonder what's actually creating those pending crash > reports then. > > Ted, could you give us some clarification please? We should be running all test harnesses with `MOZ_CRASHREPORTER_NO_REPORT=1`, which *should* make it so that no crashes ever get put into the pending directory. However, it looks like there are tests that want to test content process crash submission, and to do so they unset that environment variable: https://dxr.mozilla.org/mozilla-central/rev/8b854986038cf3f3f240697e27ef48ea65914c13/browser/base/content/test/tabcrashed/browser_clearEmail.js#12 It might be good to figure out a better API for tests to use here--perhaps we could have a one-time-use pref or other internal setting that these tests could use? The code that currently checks `MOZ_CRASHREPORTER_NO_REPORT` could then check that pref as well, and if set submit just the one crash report and then reset the pref. It's a little hairy right now because all of that happens inside nsExceptionHandler.cpp (search for callers of `ShouldReport`): https://dxr.mozilla.org/mozilla-central/rev/8b854986038cf3f3f240697e27ef48ea65914c13/toolkit/crashreporter/nsExceptionHandler.cpp#1526 Without fixing that I think gbrown's patch is the right approach--the harness checks for minidumps after each test run and cleans them up. One other thing we could do to make this less prone to error would be to add a way to override where pending dumps go when we do use this code path. Currently everything is set to use `UAppData` from the directory service, but we could easily add another env var or pref to set a different path to use for testing, which would at least mean that we wouldn't accidentally accumulate minidumps outside of the temporary directories we create for testing.

Flags: needinfo?(ted)

Geoff Brown [:gbrown]

Comment 37

•

8 years ago

Comment 36 notes interesting possibilities for improved handling of this issue - thanks! - but without any known related failures, I'm not finding the time to follow up. Leaving this bug open in case new problems arise or others are more motivated than I.

Assignee: gbrown → nobody

Tim Huang[:timhuang]

Updated

•

8 years ago

Whiteboard: [fingerprinting]

Ethan Tseng [:ethan]

Updated

•

7 years ago

Whiteboard: [fingerprinting]

Geoff Brown [:gbrown]

Updated

•

7 years ago

Priority: -- → P3

Updated

•

3 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.