Compositor crash in libX11.so.6.3.0@0x39099

RESOLVED FIXED in Firefox 66

Status

()

P3
critical
RESOLVED FIXED
a year ago
12 days ago

People

(Reporter: darkspirit, Assigned: sotaro)

Tracking

(Blocks: 2 bugs, {crash, nightly-community, regression})

Trunk
mozilla66
x86_64
Linux
crash, nightly-community, regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox57 disabled, firefox58 disabled, firefox63 disabled, firefox64 disabled, firefox65 disabled, firefox66 fixed)

Details

(Whiteboard: [wr-reserve], crash signature)

Attachments

(1 attachment, 2 obsolete attachments)

Nightly 58 x64 20171106100122 de_DE 179dae92e4d794e7f45ad080ff01908c80691f31 @ Debian Testing (KDE, Radeon RX480)
main profile: gpu process, layers force accel, webrender, blob-images, stylo-chrome

bp-4a57e40b-8663-46f0-910d-177270171107 07.11.17 02:08

(The other libX11 crash that can happen is bug 1372243 comment 9. Apart from these and bug 1412545, everything is fine.)
Sotaro any idea what this might?
Component: Graphics: Layers → Graphics: WebRender
Flags: needinfo?(sotaro.ikeda.g)
Most of these crash reports don't contain WR+?

https://crash-stats.mozilla.com/report/index/d5768cc1-0bc5-44e0-bc6a-af2460171006

Started on WR but had to fall back (from the graphics critical log). I believe I have produced similar reports in the past, but only when it has gone into fallback.
Whiteboard: [wr-mvp] [triage]
(Assignee)

Comment 3

a year ago
(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #0)
> Nightly 58 x64 20171106100122 de_DE 179dae92e4d794e7f45ad080ff01908c80691f31
> @ Debian Testing (KDE, Radeon RX480)
> main profile: gpu process, layers force accel, webrender, blob-images,
> stylo-chrome
> 
> bp-4a57e40b-8663-46f0-910d-177270171107 07.11.17 02:08
> 
> (The other libX11 crash that can happen is bug 1372243 comment 9. Apart from
> these and bug 1412545, everything is fine.)

Crash stack has TextureImageTextureSourceOGL::~TextureImageTextureSourceOGL it means CompositorOGL is used and WebRender is not used.
main profile: gpu process (max 5000 restarts), layers force accel, webrender, blob-images, stylo-chrome
> bp-75bb7765-3ede-43ac-bd25-aec230171107 07.11.17 02:32
> bp-4a57e40b-8663-46f0-910d-177270171107 07.11.17 02:08 <--- the quoted crash. standalone
> bp-fcb1d337-8eae-4091-bb32-447840171106 06.11.17 19:21

Hm. Thoughts:
A)	I disable webrender (but not layers.acceleration.force-enabled) for some minutes to test something
	and get this crash without noticing.
	If I temporarily disable webrender in my main profile, I will also disable layers.acceleration.force-enabled
	in the future. (Why don't you switch this internally for Linux? ;-)

B)	Without any corresponding crash, WebRender sometimes lets me fall back to OpenGL Compositing which is a bit
	buggy(?) and crashes.
	How could this be prevented?
	My understanding is that I should only use WebRender and in special cases mixed with BasicLayerManager
	(bug 1390741 on Windows, and maybe bug 1377321 later), but never OpenGL or mixed with OpenGL.
	The crash report speaks of the gpu process: It's unclear to me whether my main WebRender
	or a panel of a webextension (for example) could have caused this.
	

My crash report from comment 0 lets me think it's B:

https://crash-stats.mozilla.com/report/index/4a57e40b-8663-46f0-910d-177270171107#tab-metadata
> GraphicsCriticalError |[G0][GFX1-]: Failed GL context creation for WebRender: 0 (t=1060.77) |[G16][GFX1-]: [OPENGL] Failed to init compositor with  reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=2987.38) |[G2][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1062.19) |[G3][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1091.21) |[G4][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1182.59) |[G5][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1211.51) |[G6][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1284.35) |[G7][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1323.58) |[G8][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1326.13) |[G9][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1331.01) |[G10][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1336.8) |[G11][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1342.08) |[G12][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1392.53) |[G13][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1403.07) |[G14][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1406.21) |[G15][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=2965.43)

https://crash-stats.mozilla.com/report/index/4a57e40b-8663-46f0-910d-177270171107#tab-telemetryenvironment
> "compositor": "webrender"

So am I still using WebRender(?),
but either Awesomebar, a webextension panel, about:profiles>Create new profile or Help>About falls back to OpenGL and crashes then? (Because of bug 1411503 comment 3 I think that they are different instances of WebRender.)
I have seen something like this in bug 1406230 comment 3, but it's not exactly the same.
OOP Webextensions are not enabled and we don't see "Compositors might be mixed" in the quoted log. I can't help here because of lack of competence, so I will shut up now.
(Assignee)

Comment 6

a year ago
(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #4)
> 
> https://crash-stats.mozilla.com/report/index/4a57e40b-8663-46f0-910d-
> 177270171107#tab-metadata
> > GraphicsCriticalError |[G0][GFX1-]: Failed GL context creation for WebRender: 0 (t=1060.77)

The log says that gecko failed to create GL context for WebRender, it trigger to fallback to normal compositor.

> [G3][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1091.21)

The error log is related to CompositorOGL.
  https://dxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/CompositorOGL.cpp#242
Flags: needinfo?(sotaro.ikeda.g)
(Assignee)

Comment 7

a year ago
The crash seemed to happen because gdk window was destroyed before  related GLContextGLX::~GLContextGLX() was called. I saw such a crash at early stage of WebRenderBridgeParent implemntation.
(Assignee)

Comment 8

a year ago
STR
I could easily reproduce the crash with the following STR on Linux
[1] Enable CompositorOGL and GPU process, then restart Firefox
[2] Open multiple tabs on a window.
[3] Drag one tab and open new window.
[4] Close the opened window.

   [4] caused crash.
(Assignee)

Comment 9

a year ago
(In reply to Sotaro Ikeda [:sotaro] from comment #8)
> I could easily reproduce the crash with the following STR on Linux
> [1] Enable CompositorOGL and GPU process, then restart Firefox
> [2] Open multiple tabs on a window.
> [3] Drag one tab and open new window.
> [4] Close the opened window.
> 
>    [4] caused crash.

Destruction of GLContextGLX was deferred because BufferTextureHost held TextureImageTextureSourceOGL as TextureSource.
(Assignee)

Updated

a year ago
See Also: → bug 1372243
Blocks: 1386669
status-firefox57: --- → unaffected
status-firefox58: affected → unaffected
Priority: -- → P2
(Reporter)

Updated

a year ago
Blocks: 1357819
Whiteboard: [wr-mvp] [triage] → [wr-mvp]
Priority: P2 → P3
Whiteboard: [wr-mvp] → [wr-reserve]
Seen on Socorro:
> compositor": "webrender",
bp-5745ec6c-961c-47a3-b2aa-fb20c0171213
Crash Signature: [@ libX11.so.6.3.0@0x39099 ] → [@ libX11.so.6.3.0@0x39099 ] [@ libX11.so.6.3.0@0x431cb ]
(Reporter)

Comment 11

10 months ago
(In reply to Sotaro Ikeda [:sotaro] from comment #8)
> I could easily reproduce the crash with the following STR on Linux
> [1] Enable CompositorOGL and GPU process, then restart Firefox
> [2] Open multiple tabs on a window.
> [3] Drag one tab and open new window.
> [4] Close the opened window.
> 
>    [4] caused crash.

layers.acceleration.force-enabled + layers.gpu-process.enabled = bp-302940d2-68e1-465e-8e88-8b8290180606

But it doesn't seem to happen with WR anymore? Maybe I find a range.
Crash Signature: [@ libX11.so.6.3.0@0x39099 ] [@ libX11.so.6.3.0@0x431cb ] → [@ libX11.so.6.3.0@0x39099 ] [@ libX11.so.6.3.0@0x431cb ] [@ libX11.so.6.3.0@0x39c49 ]
Flags: needinfo?(jan)
Linux doesn't block release and this seems to be rare.
Blocks: 1386674
No longer blocks: 1386669
Closing because no crashes reported for 12 weeks.
Status: NEW → RESOLVED
Last Resolved: 4 months ago
Resolution: --- → WONTFIX
(Reporter)

Comment 15

4 months ago
(In reply to Release mgmt bot [:sylvestre / :calixte] from comment #14)
> Closing because no crashes reported for 12 weeks.

Not reported and just being patient.

(Sotaro Ikeda [:sotaro] from comment #8)
> I could easily reproduce the crash with the following STR on Linux
> [1] Enable CompositorOGL and GPU process, then restart Firefox
> [2] Open multiple tabs on a window.
> [3] Drag one tab and open new window.
> [4] Close the opened window.
> 
>    [4] caused crash.

bp-854d6d37-15a0-4f8e-95fe-84a5a0181124
Status: RESOLVED → REOPENED
Crash Signature: [@ libX11.so.6.3.0@0x39099 ] [@ libX11.so.6.3.0@0x431cb ] [@ libX11.so.6.3.0@0x39c49 ] → [@ libX11.so.6.3.0@0x39099 ] [@ libX11.so.6.3.0@0x431cb ] [@ libX11.so.6.3.0@0x39c49 ] [@ libX11.so.6.3.0@0x3a009 ]
status-firefox57: unaffected → disabled
status-firefox58: unaffected → disabled
status-firefox63: --- → disabled
status-firefox64: --- → disabled
status-firefox65: --- → disabled
Resolution: WONTFIX → ---
(Assignee)

Updated

4 months ago
Assignee: nobody → sotaro.ikeda.g
(Assignee)

Comment 16

4 months ago
(In reply to Sotaro Ikeda [:sotaro] from comment #9)
> (In reply to Sotaro Ikeda [:sotaro] from comment #8)
> > I could easily reproduce the crash with the following STR on Linux
> > [1] Enable CompositorOGL and GPU process, then restart Firefox
> > [2] Open multiple tabs on a window.
> > [3] Drag one tab and open new window.
> > [4] Close the opened window.
> > 
> >    [4] caused crash.
> 
> Destruction of GLContextGLX was deferred because BufferTextureHost held
> TextureImageTextureSourceOGL as TextureSource.

We could avoid the problem by using WeakPtr<gl::GLContext> in TextureImageTextureSourceOGL.
(Assignee)

Comment 19

4 months ago
attachment 9028169 [details] [diff] [review] removes RefPtr<gl::GLContext>, but TextureImageTextureSourceOGL still holds RefPtr<gl::TextureImage> that holds RefPtr<gl::GLContext> :(
(Assignee)

Updated

4 months ago
Attachment #9028169 - Attachment is obsolete: true
Comment hidden (obsolete)
(Assignee)

Comment 21

4 months ago
I have another idea for addressing it. I am going to try it.
(Assignee)

Comment 22

4 months ago
I confirmed that the patch address the crash.
(Assignee)

Updated

4 months ago
Attachment #9028549 - Flags: review?(nical.bugzilla)
Attachment #9028549 - Flags: review?(nical.bugzilla) → review+
(Reporter)

Comment 24

4 months ago
checkin-needed?
(Assignee)

Comment 25

4 months ago
I am thinking to check-in after becoming Firefox 66.

Comment 27

3 months ago
Pushed by sikeda@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ceee7e820638
Deallocate device data of TextureImageTextureSourceOGL during destroying CompositorOGL r=nical

Comment 28

3 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/ceee7e820638
Status: REOPENED → RESOLVED
Last Resolved: 4 months ago3 months ago
status-firefox66: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
status-firefox-esr60: --- → unaffected
Flags: qe-verify+
Whiteboard: [wr-reserve] → [wr-reserve][qa-triaged]
QA Whiteboard: [qa-triaged]
Whiteboard: [wr-reserve][qa-triaged] → [wr-reserve]

I could not reproduce this issue using Fx Nightly 58 x64 (20171106100122), on Ubuntu 16.04 LTS (Radeon RX480). Is it a Debian specific issue?

Flags: needinfo?(jan)

Sorry, I can only point to comment 11. Try with build 2018-06-06 or 2018-12-10. You don't need to enable WebRender for this, the GPU process was just incredible helpful for WebRender stability.

Flags: needinfo?(jan)
Keywords: regression

Unfortunately I could not reproduce this issue, I suspect it is due to the environmental differences.

Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.