Closed Bug 737437 Opened 8 years ago Closed 8 years ago
crash in mozilla::ipc::RPCChannel::On
Maybe Dequeue One when quitting
It's #3 top crasher over the last day. It first appeared in 14.0a1/20120320043530. The regression range is: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=58a2cd0203ee&tochange=ee554888d071 It might be a regression from bug 731603. Signature mozilla::ipc::RPCChannel::OnMaybeDequeueOne More Reports Search UUID fd96ad9a-7263-429b-ac5f-b180b2120320 Date Processed 2012-03-20 14:54:13 Uptime 31 Last Crash 1.9 minutes before submission Install Age 4.3 minutes since version was first installed. Install Time 2012-03-20 14:48:53 Product FennecAndroid Version 14.0a1 Build ID 20120320043530 Release Channel nightly OS Linux OS Version 0.0.0 Linux 188.8.131.52-g7b95729 #1 PREEMPT Mon Jun 13 10:34:37 CST 2011 armv7l Build Architecture arm Build Architecture Info Crash Reason SIGSEGV Crash Address 0x0 App Notes EGL? EGL+ AdapterVendorID: vision, AdapterDeviceID: HTC Vision. AdapterDescription: 'Android, Model: 'HTC Vision', Product: 'htc_vision', Manufacturer: 'HTC', Hardware: 'vision''. GL Context? GL Context+ GL Layers? GL Layers+ HTC HTC Vision htc_wwe/htc_vision/vision:2.3.3/GRI40/84109:user/release-keys EMCheckCompatibility True Frame Module Signature [Expand] Source 0 libxul.so mozilla::ipc::RPCChannel::OnMaybeDequeueOne Mutex.h:106 1 libxul.so RunnableMethod<mozilla::ipc::RPCChannel, bool , Tuple0>::Run ipc/chromium/src/base/tuple.h:383 2 libxul.so mozilla::ipc::RPCChannel::DequeueTask::Run RPCChannel.h:462 3 libxul.so MessageLoop::RunTask ipc/chromium/src/base/message_loop.cc:318 4 libxul.so MessageLoop::DeferOrRunPendingTask ipc/chromium/src/base/message_loop.cc:326 5 libxul.so MessageLoop::DoWork ipc/chromium/src/base/message_loop.cc:426 6 libxul.so mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:114 7 libxul.so MessageLoop::RunInternal ipc/chromium/src/base/message_loop.cc:208 8 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:201 9 libxul.so nsBaseAppShell::Run widget/xpwidgets/nsBaseAppShell.cpp:189 10 libxul.so nsAppStartup::Run toolkit/components/startup/nsAppStartup.cpp:295 11 libxul.so XRE_main toolkit/xre/nsAppRunner.cpp:3703 12 libxul.so GeckoStart toolkit/xre/nsAndroidStartup.cpp:109 13 libmozglue.so __res_nsend other-licenses/android/res_send.c:1086 ... More reports at: https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Aipc%3A%3ARPCChannel%3A%3AOnMaybeDequeueOne
this is suspected as being a regression from bug 731603. I see that some more bits of it landed today, can you comment on how its related?
(In reply to Brad Lassey [:blassey] from comment #2) > can you comment on how its related? gfx/layers/ipc has been modified in this bug.
This crash is only happening on devices with Adreno GPUs (and the vast majority of the crashes are on the HTC Vision).
Component: IPC → General
Product: Core → Fennec Native
QA Contact: ipc → general
Summary: crash in mozilla::ipc::RPCChannel::OnMaybeDequeueOne → crash in mozilla::ipc::RPCChannel::OnMaybeDequeueOne on devices with Adreno GPUs
Version: 14 Branch → unspecified
HTC Desire Z (A 205), and EVO 3D (A 220) reported in bug 737477 (could be a symptom of this crash) too.
(In reply to Aaron Train [:aaronmt] from comment #5) > HTC Desire Z (A 205), and EVO 3D (A 220) reported in bug 737477 (could be a > symptom of this crash) too. Yes, these sound like the same problem (particularly in light of Bug 737477 Comment 4).
I'm running into this same signature crash on my Galaxy Nexus on a latest-inbound build today (03/22) on Fennec Quit (Menu -> Quit) bp-697a989b-7c19-4595-a961-63ec32120322 bp-4efc4f0b-a83e-48f8-99bc-c978e2120322 bp-f176c5d2-4858-48e5-95f4-f04752120322 That would indicate that this is not strictly related to Adreno ...
(In reply to Aaron Train [:aaronmt] from comment #8) > That would indicate that this is not strictly related to Adreno ... True, although it might be that the crash you're hitting has a different cause (the stacks look very different from the rest).
So far, these crashes occur on: * HTC Glacier, Vision, Desire, Desire HD, Desire HD A9191, ADR6400L, ThunderBolt, Incredible S, Nexus One * Samsung SCH-I510, Galaxy Nexus * Sony Ericsson SO-02C, R800x, MT15i
Assignee: nobody → ajuma
blocking-fennec1.0: ? → beta+
(In reply to Scoobidiver from comment #10) > So far, these crashes occur on: > * HTC Glacier, Vision, Desire, Desire HD, Desire HD A9191, ADR6400L, > ThunderBolt, Incredible S, Nexus One > * Samsung SCH-I510, Galaxy Nexus > * Sony Ericsson SO-02C, R800x, MT15i Updating the list with: Samsung SAMSUNG-SGH-I897 (Samsung Captivate). This crash occurred on the latest Nightly 03/23: https://crash-stats.mozilla.com/report/index/bp-e5d6483a-9070-4973-bbc5-61c7e2120323
Perhaps the title should be changed since Samsung Captivate has a PowerVR SGX540 GPU.
Summary: crash in mozilla::ipc::RPCChannel::OnMaybeDequeueOne on devices with Adreno GPUs → crash in mozilla::ipc::RPCChannel::OnMaybeDequeueOne
Reproducible on Nightly/14.0a1 2012-03-23 on Motorola Droid 2 running Android 2.3.3 using the following scenario: 1. Open the Addons page. 2. Install an addon that requires restart(Cute Buttons - Crystal SVG or Copy as Plain Text) 3. When asked tap on restart. Actual result: Fennec crashes when Nightly is reopened.
This bug cause graphic defects to occur on restart on HTC Desire HD. 1. go to menu-> quit 2. crash; select restart 3. go to about:crashes Expected: list of crashes Actual: graphic defect: http://www.youtube.com/watch?v=b5edb_Ljm64&feature=youtube_gdata_player
To be clear: 1. Start Nightly 2. Quit Nightly *BOOM* I have no add-ons installed and no Sync setup. The crash happens 100% of the time.
(In reply to Mark Finkle (:mfinkle) from comment #15) > To be clear: > 1. Start Nightly > 2. Quit Nightly > > *BOOM* > > I have no add-ons installed and no Sync setup. The crash happens 100% of the > time. Using a Galaxy Nexus
Update: I uninstalled Flash from my phone and rebooted. Nightly still crashes on exit.
(In reply to Mark Finkle (:mfinkle) from comment #17) > Update: I uninstalled Flash from my phone and rebooted. Nightly still > crashes on exit. Here is my crash without Flash: https://crash-stats.mozilla.com/report/index/b1f63fed-2282-4512-a59d-7ecea2120324
I get the following stack when quitting on an HTC desire: #0 mozilla::ipc::RPCChannel::OnMaybeDequeueOne (this=0x4a010828) at /home/ajuma/mozilla-central/ipc/glue/RPCChannel.cpp:403 #1 0x728b39b6 in DispatchToMethod<mozilla::plugins::PluginInstanceChild, void (mozilla::plugins::PluginInstanceChild::*)()> (arg=<optimized out>, method=<optimized out>, obj=<optimized out>) at /home/ajuma/mozilla-central/ipc/chromium/src/base/tuple.h:383 #2 RunnableMethod<mozilla::plugins::PluginInstanceChild, void (mozilla::plugins::PluginInstanceChild::*)(), Tuple0>::Run (this=<optimized out>) at /home/ajuma/mozilla-central/ipc/chromium/src/base/task.h:307 #3 0x728c0f28 in Run (this=<optimized out>) at ../../dist/include/mozilla/ipc/RPCChannel.h:462 #4 mozilla::ipc::RPCChannel::DequeueTask::Run (this=<optimized out>) at ../../dist/include/mozilla/ipc/RPCChannel.h:485 #5 0x72964adc in MessageLoop::RunTask (this=0x46e4c0e0, task=0x45be8040) at /home/ajuma/mozilla-central/ipc/chromium/src/base/message_loop.cc:318 #6 0x7296590a in MessageLoop::DeferOrRunPendingTask (this=0x45bd753c, pending_task=<optimized out>) at /home/ajuma/mozilla-central/ipc/chromium/src/base/message_loop.cc:326 #7 0x729664b8 in MessageLoop::DoWork (this=0x46e4c0e0) at /home/ajuma/mozilla-central/ipc/chromium/src/base/message_loop.cc:426 #8 0x728c09e0 in mozilla::ipc::MessagePump::Run (this=0x46e271c0, aDelegate=0x46e4c0e0) at /home/ajuma/mozilla-central/ipc/glue/MessagePump.cpp:114 #9 0x72964a8c in MessageLoop::RunInternal (this=0x72d896c9) at /home/ajuma/mozilla-central/ipc/chromium/src/base/message_loop.cc:208 #10 0x72964b42 in RunHandler (this=<optimized out>) at /home/ajuma/mozilla-central/ipc/chromium/src/base/message_loop.cc:201 #11 MessageLoop::Run (this=0x46e4c0e0) at /home/ajuma/mozilla-central/ipc/chromium/src/base/message_loop.cc:175 #12 0x72855c5c in nsBaseAppShell::Run (this=0x46e28620) at /home/ajuma/mozilla-central/widget/xpwidgets/nsBaseAppShell.cpp:189 #13 0x7279e840 in nsAppStartup::Run (this=0x46277670) Not 100% sure this is the same crash, since this stack has PluginInstanceChild in frame #1 but the stacks on crash-stats don't. This initially made me suspect Flash, but Comment 17 disproves that theory.
Summary: crash in mozilla::ipc::RPCChannel::OnMaybeDequeueOne → crash in mozilla::ipc::RPCChannel::OnMaybeDequeueOne when quiting
Testing on a Nexus S using mozilla-inbound tinderbox builds, I found a different regression range: https://hg.mozilla.org/integration/mozilla-inbound/rev/a5ac2a7b72c6 doesn't crash but https://hg.mozilla.org/integration/mozilla-inbound/rev/80a7d26b02ec does crash. This means that on PowerVR devices, the regression is caused by Bug 737686 (which landed on inbound on March 21). That bug made our texture upload behaviour on PowerVR consistent with our behaviour on Adreno (that is, we avoid using glTexSubImage2D). This explains why we initially only saw this crash on Adreno devices. We still need to find what caused the regression on Adreno; I don't have an Adreno device with me today, but if someone who does can bisect the regression range from Comment 0 using inbound tinderbox builds, that would be very helpful!
Summary: crash in mozilla::ipc::RPCChannel::OnMaybeDequeueOne when quiting → crash in mozilla::ipc::RPCChannel::OnMaybeDequeueOne when quitting
This bug should be a high priority because it is our #1 topcrash. This crash is about 5x more common than the #2 topcrash!
This points to RPCChannel::mMonitor being NULL. I don't know how that can happen, though.
It means that AsyncChannel::Clear() has been called, which likely means that something is trying to IPC after ActorDestroy(). That's not allowed.
this makes the crash go away (and confirms what joe said)
Comment on attachment 609463 [details] [diff] [review] bandaid Nothing about https://hg.mozilla.org/mozilla-central/rev/734c1ef36151 looks obviously wrong to me.
Attachment #609463 - Flags: review?(jones.chris.g)
The problem seems to be that the CompositorParent is deallocating shared memory after the CompositorChild has already been destroyed. This patch prevents that.
Comment on attachment 609540 [details] [diff] [review] Don't deallocate shared memory after destruction This isn't quite right. We need to check if the layer manager is destroyed, not if the layer itself is destroyed.
(In reply to Ali Juma [:ajuma] from comment #29) > This isn't quite right. We need to check if the layer manager is destroyed, > not if the layer itself is destroyed. Probably an even better approach is to destroy the CompositorParent's layer manager before the CompositorChild is destroyed.
To recap, the problem is that when the CompositorParent's layer manager is destroyed, this triggers the destruction of shared memory held by ShadowThebesLayers, which in turn triggers IPC. Since we currently destroy the CompositorChild before destroying the CompositorParent's layer manager, this IPC is arriving too late. This makes us destroy the CompositorParent's layer manager first, then process any resulting IPC, and then destroy the CompositorChild.
(In reply to Ali Juma [:ajuma] from comment #31) > This makes us destroy the CompositorParent's layer manager first, then > process any resulting IPC, and then destroy the CompositorChild. "This *patch* make us...:
Comment on attachment 609463 [details] [diff] [review] bandaid If this happens someone has violated the IPC contract.
Attachment #609463 - Flags: review?(jones.chris.g) → review-
(In reply to Ali Juma [:ajuma] from comment #31) > Created attachment 609742 [details] [diff] [review] > Destroy the compositor's layer manager before the CompositorChild gets > destroyed. > > To recap, the problem is that when the CompositorParent's layer manager is > destroyed, this triggers the destruction of shared memory held by > ShadowThebesLayers, which in turn triggers IPC. Since we currently destroy > the CompositorChild before destroying the CompositorParent's layer manager, > this IPC is arriving too late. > > This makes us destroy the CompositorParent's layer manager first, then > process any resulting IPC, and then destroy the CompositorChild. Will take a look at this tonight. Need to page in a lot of stuff. Sorry for the delays here.
Attachment #609742 - Flags: review?(jones.chris.g) → review+
Target Milestone: --- → Firefox 14
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/5016d3f2b36d for native Talos bustage
(In reply to Phil Ringnalda (:philor) from comment #36) > Backed out in > https://hg.mozilla.org/integration/mozilla-inbound/rev/5016d3f2b36d for > native Talos bustage These failures seem to be caused by calling MessageLoop::current()->RunAllPending() in nsBaseWidget's destructor (the purpose of this call was to ensure that any pending IPC got processed before the CompositorChild got destroyed). I'm working on a patch that, instead of making this call, adds an event to the MessageLoop to handle compositor destruction.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
I'm definitely not seeing this anymore!
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.