Closed Bug 1242119 Opened 4 years ago Closed 4 years ago

Intermittent xpcshell-child-process.ini:dom/indexedDB/test/unit/test_transaction_abort.js,test_multientry.js,test_blocked_order.js | application crashed [@ moz_abort]

Categories

(Core :: Gecko Profiler, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox47 --- fixed

People

(Reporter: philor, Assigned: mccr8)

References

Details

(Keywords: intermittent-failure, Whiteboard: [MemShrink])

Attachments

(2 files)

Summary: Intermittent xpcshell-child-process.ini:dom/indexedDB/test/unit/test_transaction_abort.js | application crashed [@ moz_abort] → Intermittent xpcshell-child-process.ini:dom/indexedDB/test/unit/test_transaction_abort.js,test_multientry.js | application crashed [@ moz_abort]
It looks like we're destroying a Mutex too late, and the deadlock detector is already shut down.  Should this be using OffTheBooksMutex
Component: DOM: IndexedDB → Gecko Profiler
Flags: needinfo?(nfroyd)
Flags: needinfo?(bgirard)
I assume this is sRegisteredThreadsMutex, which we're so eager to destroy, we try to do it twice:

http://mxr.mozilla.org/mozilla-central/source/tools/profiler/core/platform.cpp#139

But on Windows, we go through:

http://mxr.mozilla.org/mozilla-central/source/xpcom/build/XPCOMInit.cpp#971

which runs before profiler_shutdown() and thereby destroys the Mutex via the UniquePtr destructor registered at static construction time, rather than explicit shutdown later on.

What I don't get (having not looked at this very long) is why are we not hitting this more often?
Flags: needinfo?(nfroyd)
Summary: Intermittent xpcshell-child-process.ini:dom/indexedDB/test/unit/test_transaction_abort.js,test_multientry.js | application crashed [@ moz_abort] → Intermittent xpcshell-child-process.ini:dom/indexedDB/test/unit/test_transaction_abort.js,test_multientry.js,test_blocked_order.js | application crashed [@ moz_abort]
IMO this is really a regression caused by bug 1035454 which introduced a huge footgun to paper of it's own issues. I wouldn't be overly surprised if there was other similar bugs caused by this but we don't have a way to filter on winxp only intermittent issues.

Ideally we would backout http://hg.mozilla.org/mozilla-central/rev/9134c098f0ee. It is stated to be temporary anyways. Otherwise we might want to consider using _exit(0) instead which IIRC wont run static destructors.

Otherwise if we want to plaster over a plaster bug we can use OffTheBooksMutex =\.
Depends on: 1035454
Flags: needinfo?(continuation)
(In reply to Benoit Girard (:BenWa) from comment #4)
> Ideally we would backout
> http://hg.mozilla.org/mozilla-central/rev/9134c098f0ee. It is stated to be
> temporary anyways.

I can see if the issue in bug 1083664 went away, and if it still around, see if the WebRTC people have a preference between investigating the issue and disabling the test on XP.

> Otherwise we might want to consider using _exit(0)
> instead which IIRC wont run static destructors.

Ah, good point. That's what ContentChild::QuickExit() does.
Assignee: nobody → continuation
Flags: needinfo?(continuation)
Depends on: 1219919
changing to _exit() seems to work:
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=4be9c404009d
But the WebRTC intermittent failure seems to have gone away, so I'll try just removing this chunk of code once I've worked out a leak suppression that works. (Removing the block causes us to run leak checking on WindowsXP, and we don't run it at all in m-c right now, so there are additional leaks revealed.)
This was disabled because it was causing intermittent failures in a test, but that failure seems to have stopped.

This will cause us to start doing leak checking in content processes on Windows XP. (We do not run them on other Windows platforms either due to bug 1219369.)

try run with a bunch of retriggers for the test suite with the WebRTC failure:

  https://treeherder.mozilla.org/#/jobs?repo=try&revision=edf6c3934483

The oranges are all leaks, because this patch makes it so that we actually run leak checking on Windows XP. I'm adding suppressions for these leaks in bug 1219919.

Here is a try run with those suppressions added:
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=bdc3933abda0
Attachment #8715024 - Flags: review?(nfroyd)
This skips destructors, so hopefully it avoids some odd behaviors. My attempt to do a try run for this failed somehow, but it seems better than leaving it alone. I split this into a separate patch to make it easier to back out one without backing out the other. I also updated the comment.
Attachment #8715026 - Flags: review?(nfroyd)
I'm marking this MemShrink because the patch is needed to run leak checking on Windows XP content processes.
Whiteboard: [MemShrink]
Blocks: 1091917
Attachment #8715024 - Flags: review?(nfroyd) → review+
Comment on attachment 8715026 [details] [diff] [review]
part 2 - Use _exit(0) to exit in B2G debug content processes.

Review of attachment 8715026 [details] [diff] [review]:
-----------------------------------------------------------------

Why are we changing the B2G behavior in a Windows intermittent bug?  For consistency with the (former) Windows implementation?

r=me with answers to the above.
Attachment #8715026 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #10)
> Why are we changing the B2G behavior in a Windows intermittent bug?  For
> consistency with the (former) Windows implementation?

I'll file a separate bug for part 2. Benoit just pointed out in this bug that exit() instead of _exit() is probably a bad idea, so I figured I'd fix it.
Filed bug 1245513 for the B2G changes.
Blocks: 1243949
https://hg.mozilla.org/mozilla-central/rev/f6c9abced33d
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Flags: needinfo?(bgirard)
You need to log in before you can comment on or make changes to this bug.