Closed Bug 1356365 Opened 3 years ago Closed 2 years ago
Crash in mozilla::ipc::Message
Channel::Will Destroy Current Message Loop
This bug was filed from the Socorro interface and is report bp-bce93404-ea89-4166-a0b5-53f842170413. ============================================================= Seen while looking at nightly crash data - crashes started using 20170413030227 build: http://bit.ly/2pbBhxo Possible regression window based on Build ID: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f40e24f40b4c4556944c762d4764eace261297f5&tochange=819a666afddc804b6099ee1b3cff3a0fdf35ec15 Bug 1349699? ni on billm
Any chance you could look at this David? It's an assertion where the compositor thread shuts down before the PCompositorBridgeParent protocol has closed (I had to click on the raw JSON file to find the ProtocolName field). I guess this user doesn't have a GPU process.
Component: IPC → Graphics
Flags: needinfo?(wmccloskey) → needinfo?(dvander)
This is now the top browser crash on Nightly.
For now let's make this DEBUG-only. There's no reason to crash everybody.
Assignee: nobody → wmccloskey
Attachment #8859373 - Flags: review?(continuation)
Attachment #8859373 - Flags: review?(continuation) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/edda32a96d46 Make MessageChannel::WillDestroyCurrentMessageLoop assertion DEBUG-only (r=mccr8)
Milan managed to accidentally reproduce this on try by enabling the GPU process in more places, and I think this is the likely cause. VideoBridgeParent and VsyncBridgeParent both run on the compositor thread, but don't retain a reference to CompositorThreadHolder. As long the holder is alive, the compositor thread will spin an event loop to wait for other threads to receive ActorDestroys. The fix is to just hold a ref in both of these actors.
Assignee: wmccloskey → dvander
Status: NEW → ASSIGNED
Attachment #8859919 - Flags: review?(matt.woodrow)
If this is the problem, why would the protocol be reported as PCompositorBridgeParent?
Probably we're seeing two different bugs, then. By xpcom-shutdown time, it's too late to report crash metadata in the GPU process, so I had to guess the protocol by looking at the stack. This does appear to fix the issue Milan saw on try.
Attachment #8859919 - Flags: review?(matt.woodrow) → review+
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/9fd7a385411f Fix a race condition between VideoBridge and Compositor thread shutdown. (bug 1356365, r=mattwoodrow)
No crashes after 20170418030220 so far, which doesn't really line up to when this landed. But the crash rate dropped quite a bit after 4/21.
I disabled the assertion on the 18th, so we need to re-enable it to see what was fixed.
Assignee: dvander → nobody
Status: ASSIGNED → NEW
[Tracking Requested - why for this release]: This bug is a top crasher.
This is the #1 top crasher for Nightly. Top Crashers for Firefox 57.0a1 Top 50 Crashing Signatures. 7 days ago through 34 minutes ago. #1 15.42% -7.29% mozilla::ipc::MessageChannel::WillDestroyCurrentMessageLoop application 1452 1437 15 0 1251 0 2017-04-13
Huge volume of crash, marking it as blocking Milan, do you know who could help on this? Thanks
It's the MessageLoop destroyed before MessageChannel ... MOZ_CRASH firing. Bill, David, this seems to have started recently (I think the oldest 57 build I see with this is 20170906100107 from https://crash-stats.mozilla.com/report/index/b76c1a69-a685-40c4-976d-d56530170909
Assignee: nobody → dvander
Priority: -- → P1
I'm betting on the sandboxing change in bug 1229829.
2 years ago
Assignee: dvander → nobody
Component: Graphics → Security: Process Sandboxing
(In reply to Milan Sreckovic [:milan] from comment #17) > I'm betting on the sandboxing change in bug 1229829. Is this based on anything other than it was close? It's certainly possible that the change would cause potentially odd regressions, but it landed in Nightly 20170905220108 and the number ramped up in 20170907220212 and has stayed consistently high since then. It seems odd that we didn't get a similar number for any of 20170905220108, 20170906100107, 20170906220306 or 20170907100318.
This is also a new signature for OSX starting at 9-7: https://crash-stats.mozilla.com/search/?signature=%3Dmozilla%3A%3Aipc%3A%3AMessageChannel%3A%3AWillDestroyCurrentMessageLoop&product=Firefox&platform=Mac%20OS%20X&date=%3E%3D2017-08-01T15%3A47%3A00.000Z&date=%3C2017-09-22T15%3A47%3A00.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports
The new offending protocol in comment #19 is PGMPParent.
(In reply to Bob Owen (:bobowen) from comment #18) > ... > > It seems odd that we didn't get a similar number for any of 20170905220108, > 20170906100107, 20170906220306 or 20170907100318. https://crash-stats.mozilla.com/report/index/b76c1a69-a685-40c4-976d-d56530170909 is 20170906100107 But, no, I don't have any data that points to sandboxing beyond the timing.
(In reply to Milan Sreckovic [:milan] from comment #21) > (In reply to Bob Owen (:bobowen) from comment #18) > > ... > > > > It seems odd that we didn't get a similar number for any of 20170905220108, > > 20170906100107, 20170906220306 or 20170907100318. > > https://crash-stats.mozilla.com/report/index/b76c1a69-a685-40c4-976d- > d56530170909 is 20170906100107 Right that's only 1 report though, this also seemed to start on Mac in build 20170907220212. So my guess is this isn't the alternate desktop, but we're going to back it out of Beta anyway for bug 1400637, so we'll know one way or the other soon.
The recent spike was caused by the landing of bug 1395330. It's not related to sandboxing. There's more discussion in bug 1398070.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1398070
Removing leave-open keyword from resolved bugs, per :sylvestre.
You need to log in before you can comment on or make changes to this bug.