Closed Bug 1494430 Opened 3 years ago Closed 3 years ago

Memory reporters crash with webrender enabled

Categories

(Core :: Graphics: WebRender, defect, P1)

x86_64
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: bugzilla.mozilla.org, Assigned: bholley)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Recently about:memory -> measure has been crashing while attempting to gather reports for bug 1378528 
The crash is intermittent.

https://crash-stats.mozilla.com/report/index/a5b4c08e-be34-4a61-9511-b50e30180922
https://crash-stats.mozilla.com/report/index/25be0879-304a-4f72-8390-73bd40180926

Maybe this has been introduced by the additional reporters added in bug 1492930 ?
Assignee: nobody → bobbyholley
Priority: -- → P1
I don't think it was bug 1492930. None of my reporters run on the render thread. Looks like that thread got torn down before the compositor thread. I see the GPU process kept getting relaunched, it is possible we switched from WebRender to the basic compositor / non-WebRender when it eventually gave up and thus possibly the render thread was shutdown?
I couldn't reproduce this. Neither with image.mem.debug-reporting;true.
I think we can wallpaper this fairly easily, I can write a patch.

The interesting bit is _why_ the Render thread isn't there.

(In reply to Andrew Osmond [:aosmond] from comment #1)
> I see the GPU process kept getting relaunched, it is possible we switched from
> WebRender to the basic compositor / non-WebRender when it eventually gave up
> and thus possibly the render thread was shutdown?

Where do you see that?
Flags: needinfo?(aosmond)
Comment on attachment 9012357 [details]
Handle torn-down Render thread when getting memory reports.

Jeff Muizelaar [:jrmuizel] has approved the revision.
Attachment #9012357 - Flags: review+
Pushed by bholley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/aceaf58aca57
Handle torn-down Render thread when getting memory reports. r=jrmuizel
On nightly, we are willing to launch the GPU process 3 extra times before falling back to putting gfx code in the parent process, for a total of 4: https://searchfox.org/mozilla-central/rev/ce57be88b8aa2ad03ace1b9684cd6c361be5109f/modules/libpref/init/all.js#4856

From the GPUProcessLaunchCount metadata, I see the number 4, and a GPUProcessStatus of "Destroyed" which tells me it gave up on the GPU process and will now try to run the compositor in the parent process. I'm trying to figure out where it disables WebRender on this code path but I don't see it, so maybe I misremember, or it has changed...

In any event, from the GraphicsCriticalError metadata, I see "[D3D11] failed to get compositor device." which is probably be triggered here:

https://searchfox.org/mozilla-central/rev/ce57be88b8aa2ad03ace1b9684cd6c361be5109f/gfx/webrender_bindings/RenderCompositorANGLE.cpp#124

This will bubble up to:

https://searchfox.org/mozilla-central/rev/ce57be88b8aa2ad03ace1b9684cd6c361be5109f/widget/nsBaseWidget.cpp#1356

And disables WebRender when we would have started with it.
Flags: needinfo?(aosmond)
(In reply to Andrew Osmond [:aosmond] from comment #7)
> On nightly, we are willing to launch the GPU process 3 extra times before
> falling back to putting gfx code in the parent process, for a total of 4:
> https://searchfox.org/mozilla-central/rev/
> ce57be88b8aa2ad03ace1b9684cd6c361be5109f/modules/libpref/init/all.js#4856
> 
> From the GPUProcessLaunchCount metadata, I see the number 4, and a
> GPUProcessStatus of "Destroyed" which tells me it gave up on the GPU process
> and will now try to run the compositor in the parent process. I'm trying to
> figure out where it disables WebRender on this code path but I don't see it,
> so maybe I misremember, or it has changed...

Ah! I needed to be logged in to see that piece of metadata.

> In any event, from the GraphicsCriticalError metadata, I see "[D3D11] failed
> to get compositor device." which is probably be triggered here:

I think the [D3D11] is a copy-paste error, since we're using ANGLE here.
 
> https://searchfox.org/mozilla-central/rev/
> ce57be88b8aa2ad03ace1b9684cd6c361be5109f/gfx/webrender_bindings/
> RenderCompositorANGLE.cpp#124

Hm! So it seems like the reporter isn't able to start the GPU process, which seems bad. Ryan, Sotaro - any thoughts on this? Should we investigate this further with the reporter?
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(rhunt)
https://hg.mozilla.org/mozilla-central/rev/aceaf58aca57
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
(In reply to Bobby Holley (:bholley) from comment #8)
> 
> > In any event, from the GraphicsCriticalError metadata, I see "[D3D11] failed
> > to get compositor device." which is probably be triggered here:
> 
> I think the [D3D11] is a copy-paste error, since we're using ANGLE here.

Yea, it is a copy-paste error. I am going to update them. But the error log seems not related to WebRender. From other logs, it seems to be emitted by CompositorD3D11::Initialize(). The GraphicsCriticalError metadata seems to lost original error, since error number was already [46] or so.
https://dxr.mozilla.org/mozilla-central/source/gfx/layers/d3d11/CompositorD3D11.cpp#121

The crash happened on chrome process. But we expect that RenderThread::AccumulateMemoryReport() is called on GPU process. On windows, WebRender is enabled only on GPU process. It seems that gecko does not handle correctly when connection to GPU process is failed.

> Hm! So it seems like the reporter isn't able to start the GPU process, which
> seems bad. Ryan, Sotaro - any thoughts on this? Should we investigate this
> further with the reporter?

Gecko does not handle correctly when GPU process did not start. If it is addressed, GraphicsCriticalError metadata is going to have cleaner error logs.
(In reply to Sotaro Ikeda [:sotaro] from comment #10)
> 
> Gecko does not handle correctly when GPU process did not start. If it is
> addressed, GraphicsCriticalError metadata is going to have cleaner error
> logs.

When connection to GPU process was failed, firefox was always fell back to BasicCompositor with some graphics errors.
Depends on: 1494533
Depends on: 1494528
No longer depends on: 1494528
Depends on: 1494538
(In reply to Bobby Holley (:bholley) from comment #8)
> 
> Hm! So it seems like the reporter isn't able to start the GPU process, which
> seems bad. Ryan, Sotaro - any thoughts on this? Should we investigate this
> further with the reporter?

About the crash of this bug, it seems to clear that the parent process always do not have RenderThread on Windows, since gecko does not enable WebRender when GPU process does not exist. But GPU process was disabled by several crashes in GPU process. Then  CompositorManagerParent is created in the parent process. But parent process does not have RenderThread on Windows. Then it caused the crash.
Flags: needinfo?(sotaro.ikeda.g)
The 8472, how do you enable WebRender? Do you still disable angle usage by setting pref gfx.webrender.force-angle to false as in Bug 1378528 comment 0?

We normally enabled WebRender just by setting pref gfx.webrender.all to true.
Flags: needinfo?(bugzilla.mozilla.org)
Flags: needinfo?(rhunt)
That all makes sense - great investigation Sotaro!
(In reply to Sotaro Ikeda [:sotaro] from comment #13)
> The 8472, how do you enable WebRender? Do you still disable angle usage by
> setting pref gfx.webrender.force-angle to false as in Bug 1378528 comment 0?

No, the option is currently at its default value (true)

> We normally enabled WebRender just by setting pref gfx.webrender.all to true.

Yes, I have also been using that setting for a while now, I occasionally check the webrender status update blog.
Flags: needinfo?(bugzilla.mozilla.org)
(In reply to The 8472 from comment #15)
> (In reply to Sotaro Ikeda [:sotaro] from comment #13)
> > The 8472, how do you enable WebRender? Do you still disable angle usage by
> > setting pref gfx.webrender.force-angle to false as in Bug 1378528 comment 0?
> 
> No, the option is currently at its default value (true)
> 
> > We normally enabled WebRender just by setting pref gfx.webrender.all to true.
> 
> Yes, I have also been using that setting for a while now, I occasionally
> check the webrender status update blog.

Ok. And does about:support generally show "Compositing" as "Basic" or as "WebRender"? Those crash reports indicate that it would have reported "Basic" at the time you tried to take a memory report, because the GPU process had crashed 4 times. I'm curious as to whether it's crashing 4 times immediately on startup, or whether it's happening over time.
Flags: needinfo?(bugzilla.mozilla.org)
> And does about:support generally show "Compositing" as "Basic" or as "WebRender"?

The latter.
Flags: needinfo?(bugzilla.mozilla.org)
You need to log in before you can comment on or make changes to this bug.