Closed
Bug 1398070
Opened 7 years ago
Closed 7 years ago
Crash in mozilla::ipc::MessageChannel::WillDestroyCurrentMessageLoop
Categories
(Core :: IPC, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla58
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox55 | --- | unaffected |
firefox56 | + | unaffected |
firefox57 | + | fixed |
firefox58 | --- | fixed |
People
(Reporter: calixte, Assigned: billm)
References
Details
(4 keywords)
Crash Data
Attachments
(1 file)
3.31 KB,
patch
|
dvander
:
review+
ritu
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-1765336c-d113-4cb1-b2ef-6fafc0170908.
=============================================================
There are 17 crashes in nightly 57 with buildid 20170907220212.
The pushlog for this nightly:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d8e238b811d3dc74515065ae8cab6c74baf0295f&tochange=b4c1ad9565ee9d00d96501c4a83083daf25c1413
For information, the ProtocolName field is "PProcessHangMonitorParent".
:billm, could you investigate please ?
Flags: needinfo?(wmccloskey)
Comment 1•7 years ago
|
||
This is the #5 Windows topcrash in Nightly 20170908220146.
Comment 2•7 years ago
|
||
It looks like this is also happening frequently in automation.
See Also: → 1388764
Assignee | ||
Comment 3•7 years ago
|
||
Any chance you could take a look at this, Blake? I'm on PTO this week.
If necessary, we can back out bug 1395330 (for the second time). But it would be nice to know what's happening with ProcessHangMonitor. The code looks correct to me on quick inspection. I wonder if this is a side effect of a content process being shut down too late somehow.
Flags: needinfo?(wmccloskey)
Updated•7 years ago
|
Assignee: nobody → mrbkap
Flags: needinfo?(mrbkap)
Comment 5•7 years ago
|
||
Pretty much all of the reports have MOZ_CRASH(MessageLoop destroyed before MessageChannel that's bound to it) as crash reason.
Tracking as topcrash.
tracking-firefox57:
--- → +
Comment 7•7 years ago
|
||
This is the #4 Windows topcrash in Nightly 20170915100121.
Comment 8•7 years ago
|
||
I wonder if bug 1397214 and family could be related to this?
Comment 9•7 years ago
|
||
as a note, I have tried to narrow down the failure in reftests in bug 1388764 comment 20.
Comment 10•7 years ago
|
||
billm theorized that this has to do with a child process lagging on shutdown. From my reading of the code, that seems to be the only likely cause. However, I don't really understand the ContentParent::KillHard path, so it's taking me some time to dig into what happens in that code. I hope to figure something out by tomorrow.
Comment 11•7 years ago
|
||
I'm not sure if the two issues are related, but at least some of the crashes are for the Compositor thread. The offending protocol for these appears to be PImageBridgeParent. We don't do any special shutdown of the protocol from the parent side, only release a static reference to the protocol object, we rely upon the child to call Close(), which clearly didn't happen. I do not know for certain, but I fear this may be related to bug 1389021.
(In reply to Andrew Osmond [:aosmond] from comment #11)
> ... but I fear this may be related to bug 1389021.
In which case the next 56 build starts showing this problem. Heads up Liz.
Flags: needinfo?(lhenry)
Comment 13•7 years ago
|
||
Tracking a potential fix for the Compositor thread crash reports in bug 1401668.
See Also: → 1401668
Updated•7 years ago
|
Comment 14•7 years ago
|
||
(In reply to Andrew Osmond [:aosmond] from comment #13)
> Tracking a potential fix for the Compositor thread crash reports in bug
> 1401668.
This appears to have solved the compositor thread crashes, as we have gotten a good number of reports on 20170921100141, but none for PImageBridgeParent. For what it was worth, this appears to be related to a quick cycle between startup and shutdown, as the solution could only have fixed the crash if we shutdown in the small window between the MessageChannel opening and OnChannelConnected (it does a message post). It is possible the other crashes related to this.
Updated•7 years ago
|
OS: Windows 10 → All
Comment 15•7 years ago
|
||
This is the #1 top crasher for Nightly.
Top Crashers for Firefox 57.0a1
Top 50 Crashing Signatures. 7 days ago through 34 minutes ago.
#1
15.42%
-7.29%
mozilla::ipc::MessageChannel::WillDestroyCurrentMessageLoop
application
1452 1437 15 0 1251 0 2017-04-13
Assignee | ||
Comment 16•7 years ago
|
||
This patch covers up the problem. Rather than crash, it just causes us to avoid ever posting to the MessageLoop that's been shut down. It's safe in some sense, although it's really just covering up bugs elsewhere. Given the large volume of crashes, I think we should probably do something like this.
Attachment #8911416 -
Flags: review?(dvander)
Comment on attachment 8911416 [details] [diff] [review]
wallpaper patch
Review of attachment 8911416 [details] [diff] [review]:
-----------------------------------------------------------------
Agree. Andrew and I discussed another alternative, requiring MessageChannels to be created with a refcounted object, to keep the associated thread alive.
Attachment #8911416 -
Flags: review?(dvander) → review+
Reporter | ||
Comment 19•7 years ago
|
||
For information, there are 1248 crashes for 943 installations in 57.0b0 (dev edition).
Comment 20•7 years ago
|
||
¡Hola!
This one is in 58 as well.
It bite me twice while trying to help :bobowen pin down https://bugzilla.mozilla.org/show_bug.cgi?id=1316665
bp-7013c932-c6e3-491c-acd2-b35060170925
9/25/2017 9:10 AM
bp-ce8d1c52-d587-4787-ae0c-3efd70170925
9/25/2017 8:16 AM
¡Gracias!
Alex
status-firefox58:
--- → affected
Comment 21•7 years ago
|
||
Pushed by wmccloskey@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/05e921246981
Convert WillDestroyCurrentMessageLoop assertion to a safe no-op (r=dvander)
Updated•7 years ago
|
Assignee: mrbkap → wmccloskey
Comment 22•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Comment 23•7 years ago
|
||
Can we flag this for uplift to 57? It's showing up there as well.
Flags: needinfo?(wmccloskey)
Comment 24•7 years ago
|
||
how do we know this is fixed? we still see test automation failures in bug 1388764 :(
Comment 25•7 years ago
|
||
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #24)
> how do we know this is fixed? we still see test automation failures in bug
> 1388764 :(
The patch in this bug simply disabled the crash in non-debug builds, so it is expect that we'll still see it in automation for debug builds.
Comment 26•7 years ago
|
||
To whomever gets to it first, we're building beta4 tomorrow (Thursday) and this crash is one of our top crashes on 57 at the moment. Can we please get an approval request ASAP?
Flags: needinfo?(dvander)
Assignee | ||
Comment 27•7 years ago
|
||
Comment on attachment 8911416 [details] [diff] [review]
wallpaper patch
Normally I would wait a few days to see the effect that this patch has on a number of related crashes. If we need to rush this, though, it's probably pretty safe. At worst we'll reduce overall crash rates but re-introduce a sec-critical crash. If that happens, we could always back the patch out, though.
Approval Request Comment
[Feature/Bug causing the regression]:bug 1395330
[User impact if declined]:crashes, mostly at shutdown
[Is this code covered by automated tests?]:a little bit
[Has the fix been verified in Nightly?]:no
[Needs manual test from QE? If yes, steps to reproduce]: none available
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]:if the patch fails to work, we could re-introduce crashes in PostTask_Helper, which are sec-critical
[Why is the change risky/not risky?]:see above
[String changes made/needed]:none
Flags: needinfo?(wmccloskey)
Attachment #8911416 -
Flags: approval-mozilla-beta?
Comment 28•7 years ago
|
||
I will note that the crashes on Nightly with this signature stopped after the second 09-25 Nightly build, which was the final one without this patch.
Updated•7 years ago
|
Comment on attachment 8911416 [details] [diff] [review]
wallpaper patch
I had a quick chat with Dan and the stability impact here is severe enough to uplift this in beta57 asap.
Attachment #8911416 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Updated•7 years ago
|
Flags: needinfo?(dvander)
Comment 30•7 years ago
|
||
bugherder uplift |
You need to log in
before you can comment on or make changes to this bug.
Description
•