crash due to invalid use of X error handler

RESOLVED FIXED in Firefox 60

Status

()

P3
normal
RESOLVED FIXED
9 months ago
8 months ago

People

(Reporter: gideon425xx, Assigned: lsalzman)

Tracking

({crash})

59 Branch
mozilla61
All
Linux
crash
Points:
---
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox-esr52 wontfix, firefox59 wontfix, firefox60 fixed, firefox61 fixed)

Details

(Whiteboard: [gfx-noted])

Attachments

(2 attachments)

(Reporter)

Description

9 months ago
Created attachment 8962911 [details]
gdb output

Firefox crashed handling an X error.

I am attaching the output of gdb but I have worked out what
the problem is.

firefox59/gfx/cairo/cairo/src/cairo-xlib-surface.c

contains a function _get_image_surface which calls XGetImage
and protects against error by setting the X error handler to
do noop.

This would be okay but the main thread is also using X at the
same time.
It is calling gdk_window_get_root_origin which also alters the
error handler (inside GDK).

It is hard to hit the bug as it requires the two threads to
overlap.

I assume the solution is to have more locking in firefox so
that only one thread is doing X stuff at a time.
Component: General → Graphics
Product: Firefox → Core
(Assignee)

Updated

9 months ago
Keywords: crash
OS: Unspecified → Linux
Priority: -- → P3
Hardware: Unspecified → All
Whiteboard: [gfx-noted]
(Assignee)

Comment 1

9 months ago
Does forcing the environment variable MOZ_CAIRO_FORCE_BUGGY_REPEAT=0 when starting Firefox alleviate the issue? Or at least, does it avoid causing Cairo to drop into _get_image_surface from inside WindowSurfaceX11Image::Commit?
Flags: needinfo?(gideon425xx)
(Reporter)

Comment 2

9 months ago
No, it still calls _get_image_surface.

Here are some instructions to help you reproduce part of the issue.

1. add a debug fprintf to function _get_image_surface

if (!ximage)
{
fprintf(stderr, "assume pixmap\n");
	    surface->use_pixmap = CAIRO_ASSUME_PIXMAP;
}


2. build firefox
3. install Xnest - available as package xnest on ubuntu
4. run Xnest with the command
   Xnest :1 -geometry 900x600 &
5. run a window manager on the xnest display
6. run firefox on xnest display
7. enable the Menu Bar (File, Edit...) by right-clicking
   next to the new tab +
8. move the firefox window left so that it is a litte
   off-screen. File menu item should now not be visible
9. visit https://duckduckgo.com/
10. put cursor on image of duck, so Learn More appears
11. move cursor to left, so Learn More is gone
12. the debug should have been generated. this will
   correspond to the X error
Flags: needinfo?(gideon425xx)
(Assignee)

Comment 3

8 months ago
Created attachment 8966461 [details] [diff] [review]
always composite 24 depth WindowSurfaceX11 as BGRX

This bug occurs when we are not using the XShm path and instead use the WindowSurfaceX11Image fallback.

We are creating a BGRA backing image for Skia, and compositing it to a BGRX window. This triggers a nasty readback for the fallback implementation deep in Cairo. It sees that the depth of the source and dest don't match, and decides to readback the dest window so that it can implement a blend. However, we never really needed a blend here anyway, since the surface really should just be handled as BGRX, except for the compositor itself, which needs to see it as BGRA.

To get past this stand-off, we create the gfxImageSurface as BGRX, and then create the temporary DrawTarget for Lock()ing it as BGRA. This avoids hitting the fallback since Commit() now is just drawing BGRX onto BGRX as far as Cairo is concerned.
Assignee: nobody → lsalzman
Status: NEW → ASSIGNED
Attachment #8966461 - Flags: review?(jmuizelaar)
Attachment #8966461 - Flags: review?(jmuizelaar) → review+

Comment 4

8 months ago
Pushed by lsalzman@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c6860625a6fd
always composite 24 depth WindowSurfaceX11 as BGRX. r=jrmuizel

Comment 5

8 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/c6860625a6fd
Status: ASSIGNED → RESOLVED
Last Resolved: 8 months ago
status-firefox61: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
(Assignee)

Comment 6

8 months ago
Can you verify this patch fixes the issue for you?
Flags: needinfo?(gideon425xx)
status-firefox59: --- → wontfix
status-firefox60: --- → fix-optional
status-firefox-esr52: --- → wontfix
(Reporter)

Comment 7

8 months ago
Can you clarify how the fix works?
Does it implement locking to avoid the simultaneous use of the error handler or does it
avoid the X error altogether?
Flags: needinfo?(gideon425xx)
(Assignee)

Comment 8

8 months ago
(In reply to gideon425xx from comment #7)
> Can you clarify how the fix works?
> Does it implement locking to avoid the simultaneous use of the error handler
> or does it
> avoid the X error altogether?

It avoids the error. Can you please test it?
Flags: needinfo?(gideon425xx)
(Reporter)

Comment 9

8 months ago
Test done. This fixes it
Flags: needinfo?(gideon425xx)
(Assignee)

Comment 10

8 months ago
Comment on attachment 8966461 [details] [diff] [review]
always composite 24 depth WindowSurfaceX11 as BGRX

Approval Request Comment
[Feature/Bug causing the regression]: bug 1285561
[User impact if declined]: Crashes if Firefox is run on X11 (Linux/BSD/etc.) without XShm support, and bad performance when not crashing.
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no 
[List of other uplifts needed for the feature/fix]:
[Is the change risky?]: No
[Why is the change risky/not risky?]: Doesn't introduce new behavior. Just essentially reverts back to expected blitting path in Cairo for the compositor.
[String changes made/needed]: none
Attachment #8966461 - Flags: approval-mozilla-beta?
Blocks: 1285561
Comment on attachment 8966461 [details] [diff] [review]
always composite 24 depth WindowSurfaceX11 as BGRX

Bit scary, but I guess not crashing is good :)  approved for 60.0b12
Attachment #8966461 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Setting qe-verify- based on Lee's assessment on manual testing needs (Comment 10) and the fact that this fix has automated tests.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.