Closed Bug 1319702 Opened 3 years ago Closed 3 years ago

CrashManager.addSubmissionAttempt() can race CrashManager.addCrash()

Categories

(Toolkit :: Crash Reporting, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox53 --- fixed

People

(Reporter: gsvelto, Assigned: gsvelto)

References

Details

Attachments

(1 file, 2 obsolete files)

This caused me no end of grief while testing the various patches needed for enabling client-side stack walking and I suspect it's responsible for a number of intermittent issues in tests dealing with crashes.

In a nutshell, the process of adding a crash to the crash manager is asynchronous however some of our code (e.g. CrashSubmit.submit()) may call CrashManager.addSubmissionAttempt() before CrashManager.addCrash() has finished. This usually throws since the CrashManager won't find the crash that was just submitted because it hasn't been stored yet.

Introducing a delay in CrashManager.addCrash() causes this to fail systematically. The following tests are affected by this:

dom/plugins/test/mochitest/test_crash_submit.xul
dom/plugins/test/mochitest/test_hang_submit.xul

More could be affected though; I'm now doing some extra testing to identify all of them.
Blocks: 1293656
Component: General → Breakpad Integration
Product: Firefox → Toolkit
These tests also seem to be affected:

browser/base/content/test/plugins/browser_CTP_crashreporting.js
browser/base/content/test/plugins/browser_pluginCrashCommentAndURL.js
browser/base/content/test/tabcrashed/browser_shown.js
browser/base/content/test/tabcrashed/browser_clearEmail.js
This patch fixes the potential races within the CrashManager and modifies the two tests that rely on its internals so that they're not racy either. All the tests I've mentioned in the previous comments are now robust and do not fail even if I inject artificial 5-second delays in addCrash()'s promise chain.

This is needed for bug 1293656 to work properly and it might have saved me a lot of re-triggers during the try runs if I had figured it out before :-/
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Attachment #8813628 - Flags: review?(benjamin)
I've just noticed that CrashManager.addSubmissionAttempt() and CrashManager.addSubmissionResult() can also race WRT to each other (though it's very unlikely to happen) but more importantly test_hang_submit.xul checks synchronously for the submission record so it's also likely to fail intermittently on that.
Comment on attachment 8813628 [details] [diff] [review]
[PATCH] Ensure that the CrashManager public methods cannot race with addCrash() and fix the related tests

Clearing the review flag until I fix the other race I just found. I want to be sure I don't see any more orange related to this stuff.
Attachment #8813628 - Flags: review?(benjamin)
This patch also fixes the race in CrashSubmit, the try run is here:

https://hg.mozilla.org/try/rev/740382c7a5ecf95175d574236ec517d57e9eee37

Once the client-side stack-walking is behind me I'd really love to have some time to refactor this code so that dependencies between the various parts are explicit. There's too many moving parts that don't talk to each other or assume that things magically work without synchronization.
Attachment #8813628 - Attachment is obsolete: true
While testing this some more I've hit another even nastier race: sometimes CrashManager.addSubmissionAttempt() is called even before CrashManager.addCrash(). This is because the paths that execute both functions are both asynchronous and completely separate (they rely on events which are dispatched at different times). I'll have to rethink this code so that there's some form of high-level synchronization.
OK, this should work correctly and it works well with my patches for bug 1293656. The try run is here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=baf7ff442379d8714d148c714e688eaff2723927
Attachment #8813669 - Attachment is obsolete: true
Attachment #8814175 - Flags: review?(benjamin)
Attachment #8814175 - Flags: review?(benjamin) → review+
Thanks for the review Benjamin! The green try run is here - https://treeherder.mozilla.org/#/jobs?repo=try&revision=2397bc80a3d987304d030ac447f93b526c7550a4 - plus I've manually tested to ensure it didn't introduce issues in the crash-submission process. Ready to land.
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5d8855350a72
Fix races in the CrashManager and CrashSubmit objects and in the related tests r=bsmedberg
https://hg.mozilla.org/mozilla-central/rev/5d8855350a72
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Blocks: 1328657
You need to log in before you can comment on or make changes to this bug.