Closed Bug 1372880 Opened 2 years ago Closed 2 years ago

Crash in [@ mozilla::wr::WebRenderAPI::SetRootPipeline ]

Categories

(Core :: Graphics: WebRender, defect, critical)

56 Branch
x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- unaffected
firefox54 --- unaffected
firefox55 --- disabled
firefox56 --- fixed

People

(Reporter: darkspirit, Assigned: sotaro)

References

(Depends on 1 open bug)

Details

(Keywords: crash, csectype-nullptr, nightly-community)

Crash Data

Attachments

(1 file, 2 obsolete files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0
Build ID: 20170613100218

Steps to reproduce:

1. https://addons.mozilla.org/firefox/addon/chrome-store-foxified/
2. save as file: https://chrome.google.com/webstore/detail/ipvfoo/ecanpcehffngcegjmadlcijfolapggal (it's like IPvFox, but a webextension)
3. prefs
extensions.interposition.enabled;false
extensions.legacy.enabled;false
extensions.legacy.exceptions;
extensions.webextensions.remote;true
gfx.webrender.enabled;true
xpinstall.signatures.required;false
4. install addon from file 
5. visit any website, you see a red 4 or green 6 inside the address bar on the right side.
click on it.
6. There is a crash with OOP Webextensions enabled:
bp-461fcf71-87a4-4f3e-8caa-2c50a0170614 [@ mozilla::layers::PWebRenderBridgeChild::SendCreate ]
bp-d4e54251-08c1-48e3-a9e9-968d10170614 [@ mozilla::wr::WebRenderAPI::SetRootPipeline ]
Crash Signature: [@ mozilla::wr::WebRenderAPI::SetRootPipeline ]
Has STR: --- → yes
OS: Unspecified → Linux
Hardware: Unspecified → x86_64
See Also: → 1350408
I also have (might not be relevant)
browser.tabs.remote.force-enable;true
dom.ipc.processCount;100
gl.require-hardware;true
layers.acceleration.force-enabled;true
layers.gpu-process.enabled;true
layers.gpu-process.force-enabled;true
layers.gpu-process.max_restarts;50
plugin.allowed_types;
plugin.default.state;0
plugin.disable;true
plugin.state.flash;0
plugin.state.java;0
privacy.popups.disable_from_plugins;3

The crash is not always on the first click. There can be a small delay.
bp-0c9b9896-d5ba-47c6-8b49-527cf0170614 [@ mozilla::wr::WebRenderAPI::SetRootPipeline ] (this bug)
bp-96d0c3a8-2f80-4c19-a800-8e5770170614 [@ mozilla::layers::PWebRenderBridgeChild::SendCreate ] (bug 1350408)
retested with Nightly 56 (20170620100236) @ Debian testing:
Click on a webextension icon inside the address bar. Browser crash.

https://addons.mozilla.org/firefox/addon/flag-plus/

extensions.interposition.enabled;false
extensions.legacy.enabled;false
extensions.webextensions.remote;true
gfx.webrender.enabled;true
xpinstall.signatures.required;false

bp-030fb3bc-392d-4cff-978c-dbc2e0170620 20.06.17 16:47 [@ mozilla::wr::WebRenderAPI::SetRootPipeline ]
= this bug
bp-ae1e905b-ddb5-446d-b432-a8bed0170620 20.06.17 16:47 [@ mozilla::layers::PWebRenderBridgeChild::SendCreate ]
= bug 1350408

A Win10 test will follow later.
Severity: normal → critical
I had webrender and stylo enabled but NOT oop webextensions and am using Nightly (20170725144053) with Fedora 26 x64. 

https://crash-stats.mozilla.com/report/index/8731324c-1e96-47aa-a7ba-d385d0170726
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee: nobody → sotaro.ikeda.g
Bug 1350408 addressed the crash of client side. But it did not addressed the problem of parent side.
By changing source, if CompositorBridgeParent::AllocPWebRenderBridgeParent() returned nulptr, it caused crash at PCompositorBridgeChild::SendPWebRenderBridgeConstructor(). Then it is better to return valid PWebRenderBridgeParent*.

The following was call stack.

(gdb) info stack
#0  mozalloc_abort (
    msg=msg@entry=0x7fffffff85d0 "[Parent 26129] ###!!! ABORT: IPDL error [PCompositorBridgeChild]: \"constructor for actor failed\". abort()ing as a result.: file /home/sotaro/firefox_qr_dsk/mozilla-central/ipc/glue/ProtocolUtils.cpp, "...)
    at /home/sotaro/firefox_qr_dsk/mozilla-central/memory/mozalloc/mozalloc_abort.cpp:33
#1  0x00007fffe98c6cce in Abort (
    aMsg=0x7fffffff85d0 "[Parent 26129] ###!!! ABORT: IPDL error [PCompositorBridgeChild]: \"constructor for actor failed\". abort()ing as a result.: file /home/sotaro/firefox_qr_dsk/mozilla-central/ipc/glue/ProtocolUtils.cpp, "...)
    at /home/sotaro/firefox_qr_dsk/mozilla-central/xpcom/base/nsDebugImpl.cpp:451
#2  NS_DebugBreak (aSeverity=<optimized out>, aStr=<optimized out>, aExpr=<optimized out>, 
    aFile=0x7fffede52e60 "/home/sotaro/firefox_qr_dsk/mozilla-central/ipc/glue/ProtocolUtils.cpp", aLine=306)
    at /home/sotaro/firefox_qr_dsk/mozilla-central/xpcom/base/nsDebugImpl.cpp:438
#3  0x00007fffe9e76e8a in mozilla::ipc::FatalError (aProtocolName=aProtocolName@entry=0x7fffedfe723e "PCompositorBridgeChild", 
    aMsg=aMsg@entry=0x7fffedfe266b "constructor for actor failed", aIsParent=aIsParent@entry=false)
    at /home/sotaro/firefox_qr_dsk/mozilla-central/ipc/glue/ProtocolUtils.cpp:306
#4  0x00007fffebb01a0d in mozilla::dom::ContentChild::FatalErrorIfNotUsingGPUProcess (
    aProtocolName=0x7fffedfe723e "PCompositorBridgeChild", aErrorMsg=0x7fffedfe266b "constructor for actor failed", aOtherPid=26129)
    at /home/sotaro/firefox_qr_dsk/mozilla-central/dom/ipc/ContentChild.cpp:3316
#5  0x00007fffea3cbe21 in mozilla::layers::PCompositorBridgeChild::SendPWebRenderBridgeConstructor (this=this@entry=0x7fffcbf64800, 
    actor=<optimized out>, pipelineId=..., aSize=..., textureFactoryIdentifier=textureFactoryIdentifier@entry=0x7fffffff8b10, 
    idNamespace=idNamespace@entry=0x7fffffff8adc)
    at /home/sotaro/firefox_qr_dsk/mozilla-central/obj-x86_64-pc-linux-gnu/ipc/ipdl/PCompositorBridgeChild.cpp:1263
#6  0x00007fffea3cc22e in mozilla::layers::PCompositorBridgeChild::SendPWebRenderBridgeConstructor (this=this@entry=0x7fffcbf64800, 
    pipelineId=..., aSize=..., textureFactoryIdentifier=textureFactoryIdentifier@entry=0x7fffffff8b10, 
    idNamespace=idNamespace@entry=0x7fffffff8adc)
    at /home/sotaro/firefox_qr_dsk/mozilla-central/obj-x86_64-pc-linux-gnu/ipc/ipdl/PCompositorBridgeChild.cpp:1199
#7  0x00007fffea7e3893 in mozilla::layers::WebRenderLayerManager::Initialize (this=0x7fffcbf63400, aCBChild=0x7fffcbf64800, 
    aLayersId=..., aTextureFactoryIdentifier=aTextureFactoryIdentifier@entry=0x7fffffff8d10)
    at /home/sotaro/firefox_qr_dsk/mozilla-central/gfx/layers/wr/WebRenderLayerManager.cpp:64
#8  0x00007fffebd639e9 in nsBaseWidget::CreateCompositor (this=0x7fffd0063000, aWidth=<optimized out>, aHeight=<optimized out>)
    at /home/sotaro/firefox_qr_dsk/mozilla-central/widget/nsBaseWidget.cpp:1353
#9  0x00007fffebd5f844 in nsBaseWidget::GetLayerManager (this=0x7fffd0063000, aShadowManager=<optimized out>,
See Also: → 1384548
Attachment #8890705 - Attachment is obsolete: true
Attachment #8890730 - Flags: review?(bugmail)
Comment on attachment 8890730 [details] [diff] [review]
patch - Add WebRender creation failure handling

Review of attachment 8890730 [details] [diff] [review]:
-----------------------------------------------------------------

Looks reasonable but I'd rather have Andrew review since he last touched this code. See also my comment below.

::: widget/nsBaseWidget.cpp
@@ +1308,4 @@
>        *aOptionsOut = options;
>        return lm.forget();
>      }
> +    DestroyCompositor();

It seems more correct to me to move this DestroyCompositor call up inside the WR LayersBackend check where you set retry = true. Then you also don't need to change the mCompositorSession || !retry to a && condition.
Attachment #8890730 - Flags: review?(bugmail) → review?(aosmond)
Comment on attachment 8890730 [details] [diff] [review]
patch - Add WebRender creation failure handling

Review of attachment 8890730 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Sotaro Ikeda [:sotaro] from comment #5)
> By changing source, if CompositorBridgeParent::AllocPWebRenderBridgeParent()
> returned nulptr, it caused crash at
> PCompositorBridgeChild::SendPWebRenderBridgeConstructor(). Then it is better
> to return valid PWebRenderBridgeParent*.
>

Am I reading this stack trace correctly? I was confused as to why I wasn't seeing this crash myself, and how exactly this patch would fix it, but it looks like WR lives inside the UI process instead of a separate GPU process. Shouldn't gfxPlatform::NotifyGPUProcessDisabled() have prevented this?
(In reply to Andrew Osmond [:aosmond] from comment #10)
> Comment on attachment 8890730 [details] [diff] [review]
> patch - Add WebRender creation failure handling
> 
> Review of attachment 8890730 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> (In reply to Sotaro Ikeda [:sotaro] from comment #5)
> > By changing source, if CompositorBridgeParent::AllocPWebRenderBridgeParent()
> > returned nulptr, it caused crash at
> > PCompositorBridgeChild::SendPWebRenderBridgeConstructor(). Then it is better
> > to return valid PWebRenderBridgeParent*.
> >
> 
> Am I reading this stack trace correctly? I was confused as to why I wasn't
> seeing this crash myself, and how exactly this patch would fix it, but it
> looks like WR lives inside the UI process instead of a separate GPU process.
> Shouldn't gfxPlatform::NotifyGPUProcessDisabled() have prevented this?

Hm, I guess it will start up okay if you force WebRender on without enabling the GPU process. And while we will want the GPU process on Linux (at least, I always run with it on to be closer to Windows behaviour), I suppose this isn't an option for Mac OS X so we will want to handle this configuration sanely...
(In reply to Andrew Osmond [:aosmond] from comment #10)
> 
> Am I reading this stack trace correctly? I was confused as to why I wasn't
> seeing this crash myself, and how exactly this patch would fix it, but it
> looks like WR lives inside the UI process instead of a separate GPU process.
> Shouldn't gfxPlatform::NotifyGPUProcessDisabled() have prevented this?

I got the stack on linux. GPU process is enabled only on windows. On windows, webrender is enabled only when gpu-process is enabled to avoid conflicts of ANGLE usage.
Comment on attachment 8890730 [details] [diff] [review]
patch - Add WebRender creation failure handling

Review of attachment 8890730 [details] [diff] [review]:
-----------------------------------------------------------------

This is above and beyond this patch, but it would be nice if IPDL was more nuanced in its CompositorManagerChild::HandleFatalError call. I think that is where the true error lies.

Having the option to not crash on some constructor failures would be nice, because we know we added code to handle that scenario. Then we wouldn't need to handle both a nullptr case (if the GPU process actually crashes) and a non-nullptr but invalid case (if the UI/GPU process are combined, and we can't return nullptr like in the crash case).

::: widget/nsBaseWidget.cpp
@@ +1295,5 @@
> +        retry = true;
> +        // Disable WebRender
> +        gfx::gfxConfig::GetFeature(gfx::Feature::WEBRENDER).ForceDisable(
> +          gfx::FeatureStatus::Unavailable,
> +          "WebRender initialization is failed",

nit: Remove "is".
Attachment #8890730 - Flags: review?(aosmond) → review+
(In reply to Sotaro Ikeda [:sotaro] from comment #12)
> (In reply to Andrew Osmond [:aosmond] from comment #10)
> > 
> > Am I reading this stack trace correctly? I was confused as to why I wasn't
> > seeing this crash myself, and how exactly this patch would fix it, but it
> > looks like WR lives inside the UI process instead of a separate GPU process.
> > Shouldn't gfxPlatform::NotifyGPUProcessDisabled() have prevented this?
> 
> I got the stack on linux. GPU process is enabled only on windows. On
> windows, webrender is enabled only when gpu-process is enabled to avoid
> conflicts of ANGLE usage.

The first thing I do with every clobber build is force enable the GPU process. It is second nature to me now :).
Applied the comments.
Attachment #8890730 - Attachment is obsolete: true
Attachment #8891229 - Flags: review+
Pushed by sikeda@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e781433714c4
Add WebRender creation failure handling r=aosmond
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8d6a86192b79
Followup to fix typo in function name. r=me
(I was too lazy to file a new bug so I put the typo fix here, because this patch is how I noticed it... but it's otherwise technically unrelated).
https://hg.mozilla.org/mozilla-central/rev/e781433714c4
https://hg.mozilla.org/mozilla-central/rev/8d6a86192b79
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Duplicate of this bug: 1384548
Duplicate of this bug: 1352574
Duplicate of this bug: 1350404
Duplicate of this bug: 1379545
You need to log in before you can comment on or make changes to this bug.