Closed Bug 1415020 Opened 3 years ago Closed 2 years ago

Compositor crash in libX11.so.6.3.0@0x39099

Categories

(Core :: Graphics: WebRender, defect, P3)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
firefox-esr60 --- unaffected
firefox57 --- disabled
firefox58 --- disabled
firefox63 --- disabled
firefox64 --- disabled
firefox65 --- disabled
firefox66 --- fixed

People

(Reporter: darkspirit, Assigned: sotaro)

References

(Blocks 2 open bugs)

Details

(Keywords: crash, nightly-community, regression, Whiteboard: [wr-reserve])

Crash Data

Attachments

(1 file, 2 obsolete files)

Nightly 58 x64 20171106100122 de_DE 179dae92e4d794e7f45ad080ff01908c80691f31 @ Debian Testing (KDE, Radeon RX480)
main profile: gpu process, layers force accel, webrender, blob-images, stylo-chrome

bp-4a57e40b-8663-46f0-910d-177270171107 07.11.17 02:08

(The other libX11 crash that can happen is bug 1372243 comment 9. Apart from these and bug 1412545, everything is fine.)
Sotaro any idea what this might?
Component: Graphics: Layers → Graphics: WebRender
Flags: needinfo?(sotaro.ikeda.g)
Most of these crash reports don't contain WR+?

https://crash-stats.mozilla.com/report/index/d5768cc1-0bc5-44e0-bc6a-af2460171006

Started on WR but had to fall back (from the graphics critical log). I believe I have produced similar reports in the past, but only when it has gone into fallback.
Whiteboard: [wr-mvp] [triage]
(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #0)
> Nightly 58 x64 20171106100122 de_DE 179dae92e4d794e7f45ad080ff01908c80691f31
> @ Debian Testing (KDE, Radeon RX480)
> main profile: gpu process, layers force accel, webrender, blob-images,
> stylo-chrome
> 
> bp-4a57e40b-8663-46f0-910d-177270171107 07.11.17 02:08
> 
> (The other libX11 crash that can happen is bug 1372243 comment 9. Apart from
> these and bug 1412545, everything is fine.)

Crash stack has TextureImageTextureSourceOGL::~TextureImageTextureSourceOGL it means CompositorOGL is used and WebRender is not used.
main profile: gpu process (max 5000 restarts), layers force accel, webrender, blob-images, stylo-chrome
> bp-75bb7765-3ede-43ac-bd25-aec230171107 07.11.17 02:32
> bp-4a57e40b-8663-46f0-910d-177270171107 07.11.17 02:08 <--- the quoted crash. standalone
> bp-fcb1d337-8eae-4091-bb32-447840171106 06.11.17 19:21

Hm. Thoughts:
A)	I disable webrender (but not layers.acceleration.force-enabled) for some minutes to test something
	and get this crash without noticing.
	If I temporarily disable webrender in my main profile, I will also disable layers.acceleration.force-enabled
	in the future. (Why don't you switch this internally for Linux? ;-)

B)	Without any corresponding crash, WebRender sometimes lets me fall back to OpenGL Compositing which is a bit
	buggy(?) and crashes.
	How could this be prevented?
	My understanding is that I should only use WebRender and in special cases mixed with BasicLayerManager
	(bug 1390741 on Windows, and maybe bug 1377321 later), but never OpenGL or mixed with OpenGL.
	The crash report speaks of the gpu process: It's unclear to me whether my main WebRender
	or a panel of a webextension (for example) could have caused this.
	

My crash report from comment 0 lets me think it's B:

https://crash-stats.mozilla.com/report/index/4a57e40b-8663-46f0-910d-177270171107#tab-metadata
> GraphicsCriticalError |[G0][GFX1-]: Failed GL context creation for WebRender: 0 (t=1060.77) |[G16][GFX1-]: [OPENGL] Failed to init compositor with  reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=2987.38) |[G2][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1062.19) |[G3][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1091.21) |[G4][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1182.59) |[G5][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1211.51) |[G6][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1284.35) |[G7][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1323.58) |[G8][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1326.13) |[G9][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1331.01) |[G10][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1336.8) |[G11][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1342.08) |[G12][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1392.53) |[G13][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1403.07) |[G14][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1406.21) |[G15][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=2965.43)

https://crash-stats.mozilla.com/report/index/4a57e40b-8663-46f0-910d-177270171107#tab-telemetryenvironment
> "compositor": "webrender"

So am I still using WebRender(?),
but either Awesomebar, a webextension panel, about:profiles>Create new profile or Help>About falls back to OpenGL and crashes then? (Because of bug 1411503 comment 3 I think that they are different instances of WebRender.)
I have seen something like this in bug 1406230 comment 3, but it's not exactly the same.
OOP Webextensions are not enabled and we don't see "Compositors might be mixed" in the quoted log. I can't help here because of lack of competence, so I will shut up now.
(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #4)
> 
> https://crash-stats.mozilla.com/report/index/4a57e40b-8663-46f0-910d-
> 177270171107#tab-metadata
> > GraphicsCriticalError |[G0][GFX1-]: Failed GL context creation for WebRender: 0 (t=1060.77)

The log says that gecko failed to create GL context for WebRender, it trigger to fallback to normal compositor.

> [G3][GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT (t=1091.21)

The error log is related to CompositorOGL.
  https://dxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/CompositorOGL.cpp#242
Flags: needinfo?(sotaro.ikeda.g)
The crash seemed to happen because gdk window was destroyed before  related GLContextGLX::~GLContextGLX() was called. I saw such a crash at early stage of WebRenderBridgeParent implemntation.
I could easily reproduce the crash with the following STR on Linux
[1] Enable CompositorOGL and GPU process, then restart Firefox
[2] Open multiple tabs on a window.
[3] Drag one tab and open new window.
[4] Close the opened window.

   [4] caused crash.
(In reply to Sotaro Ikeda [:sotaro] from comment #8)
> I could easily reproduce the crash with the following STR on Linux
> [1] Enable CompositorOGL and GPU process, then restart Firefox
> [2] Open multiple tabs on a window.
> [3] Drag one tab and open new window.
> [4] Close the opened window.
> 
>    [4] caused crash.

Destruction of GLContextGLX was deferred because BufferTextureHost held TextureImageTextureSourceOGL as TextureSource.
See Also: → 1372243
Whiteboard: [wr-mvp] [triage] → [wr-mvp]
Priority: P2 → P3
Whiteboard: [wr-mvp] → [wr-reserve]
Seen on Socorro:
> compositor": "webrender",
bp-5745ec6c-961c-47a3-b2aa-fb20c0171213
Crash Signature: [@ libX11.so.6.3.0@0x39099 ] → [@ libX11.so.6.3.0@0x39099 ] [@ libX11.so.6.3.0@0x431cb ]
(In reply to Sotaro Ikeda [:sotaro] from comment #8)
> I could easily reproduce the crash with the following STR on Linux
> [1] Enable CompositorOGL and GPU process, then restart Firefox
> [2] Open multiple tabs on a window.
> [3] Drag one tab and open new window.
> [4] Close the opened window.
> 
>    [4] caused crash.

layers.acceleration.force-enabled + layers.gpu-process.enabled = bp-302940d2-68e1-465e-8e88-8b8290180606

But it doesn't seem to happen with WR anymore? Maybe I find a range.
Crash Signature: [@ libX11.so.6.3.0@0x39099 ] [@ libX11.so.6.3.0@0x431cb ] → [@ libX11.so.6.3.0@0x39099 ] [@ libX11.so.6.3.0@0x431cb ] [@ libX11.so.6.3.0@0x39c49 ]
Flags: needinfo?(jan)
Linux doesn't block release and this seems to be rare.
Blocks: stage-wr-next
No longer blocks: stage-wr-trains
Closing because no crashes reported for 12 weeks.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
(In reply to Release mgmt bot [:sylvestre / :calixte] from comment #14)
> Closing because no crashes reported for 12 weeks.

Not reported and just being patient.

(Sotaro Ikeda [:sotaro] from comment #8)
> I could easily reproduce the crash with the following STR on Linux
> [1] Enable CompositorOGL and GPU process, then restart Firefox
> [2] Open multiple tabs on a window.
> [3] Drag one tab and open new window.
> [4] Close the opened window.
> 
>    [4] caused crash.

bp-854d6d37-15a0-4f8e-95fe-84a5a0181124
Status: RESOLVED → REOPENED
Crash Signature: [@ libX11.so.6.3.0@0x39099 ] [@ libX11.so.6.3.0@0x431cb ] [@ libX11.so.6.3.0@0x39c49 ] → [@ libX11.so.6.3.0@0x39099 ] [@ libX11.so.6.3.0@0x431cb ] [@ libX11.so.6.3.0@0x39c49 ] [@ libX11.so.6.3.0@0x3a009 ]
Resolution: WONTFIX → ---
Assignee: nobody → sotaro.ikeda.g
(In reply to Sotaro Ikeda [:sotaro] from comment #9)
> (In reply to Sotaro Ikeda [:sotaro] from comment #8)
> > I could easily reproduce the crash with the following STR on Linux
> > [1] Enable CompositorOGL and GPU process, then restart Firefox
> > [2] Open multiple tabs on a window.
> > [3] Drag one tab and open new window.
> > [4] Close the opened window.
> > 
> >    [4] caused crash.
> 
> Destruction of GLContextGLX was deferred because BufferTextureHost held
> TextureImageTextureSourceOGL as TextureSource.

We could avoid the problem by using WeakPtr<gl::GLContext> in TextureImageTextureSourceOGL.
attachment 9028169 [details] [diff] [review] removes RefPtr<gl::GLContext>, but TextureImageTextureSourceOGL still holds RefPtr<gl::TextureImage> that holds RefPtr<gl::GLContext> :(
Attachment #9028169 - Attachment is obsolete: true
I have another idea for addressing it. I am going to try it.
I confirmed that the patch address the crash.
Attachment #9028549 - Flags: review?(nical.bugzilla)
Attachment #9028549 - Flags: review?(nical.bugzilla) → review+
checkin-needed?
I am thinking to check-in after becoming Firefox 66.
Pushed by sikeda@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ceee7e820638
Deallocate device data of TextureImageTextureSourceOGL during destroying CompositorOGL r=nical
https://hg.mozilla.org/mozilla-central/rev/ceee7e820638
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Flags: qe-verify+
Whiteboard: [wr-reserve] → [wr-reserve][qa-triaged]
QA Whiteboard: [qa-triaged]
Whiteboard: [wr-reserve][qa-triaged] → [wr-reserve]

I could not reproduce this issue using Fx Nightly 58 x64 (20171106100122), on Ubuntu 16.04 LTS (Radeon RX480). Is it a Debian specific issue?

Flags: needinfo?(jan)

Sorry, I can only point to comment 11. Try with build 2018-06-06 or 2018-12-10. You don't need to enable WebRender for this, the GPU process was just incredible helpful for WebRender stability.

Flags: needinfo?(jan)

Unfortunately I could not reproduce this issue, I suspect it is due to the environmental differences.

Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.