Closed Bug 1449352 Opened 6 years ago Closed 6 years ago

crash due to invalid use of X error handler

Categories

(Core :: Graphics, defect, P3)

59 Branch
All
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox-esr52 --- wontfix
firefox59 --- wontfix
firefox60 --- fixed
firefox61 --- fixed

People

(Reporter: gideon425xx, Assigned: lsalzman)

References

Details

(Keywords: crash, Whiteboard: [gfx-noted])

Attachments

(2 files)

Attached file gdb output
Firefox crashed handling an X error.

I am attaching the output of gdb but I have worked out what
the problem is.

firefox59/gfx/cairo/cairo/src/cairo-xlib-surface.c

contains a function _get_image_surface which calls XGetImage
and protects against error by setting the X error handler to
do noop.

This would be okay but the main thread is also using X at the
same time.
It is calling gdk_window_get_root_origin which also alters the
error handler (inside GDK).

It is hard to hit the bug as it requires the two threads to
overlap.

I assume the solution is to have more locking in firefox so
that only one thread is doing X stuff at a time.
Component: General → Graphics
Product: Firefox → Core
Keywords: crash
OS: Unspecified → Linux
Priority: -- → P3
Hardware: Unspecified → All
Whiteboard: [gfx-noted]
Does forcing the environment variable MOZ_CAIRO_FORCE_BUGGY_REPEAT=0 when starting Firefox alleviate the issue? Or at least, does it avoid causing Cairo to drop into _get_image_surface from inside WindowSurfaceX11Image::Commit?
Flags: needinfo?(gideon425xx)
No, it still calls _get_image_surface.

Here are some instructions to help you reproduce part of the issue.

1. add a debug fprintf to function _get_image_surface

if (!ximage)
{
fprintf(stderr, "assume pixmap\n");
	    surface->use_pixmap = CAIRO_ASSUME_PIXMAP;
}


2. build firefox
3. install Xnest - available as package xnest on ubuntu
4. run Xnest with the command
   Xnest :1 -geometry 900x600 &
5. run a window manager on the xnest display
6. run firefox on xnest display
7. enable the Menu Bar (File, Edit...) by right-clicking
   next to the new tab +
8. move the firefox window left so that it is a litte
   off-screen. File menu item should now not be visible
9. visit https://duckduckgo.com/
10. put cursor on image of duck, so Learn More appears
11. move cursor to left, so Learn More is gone
12. the debug should have been generated. this will
   correspond to the X error
Flags: needinfo?(gideon425xx)
This bug occurs when we are not using the XShm path and instead use the WindowSurfaceX11Image fallback.

We are creating a BGRA backing image for Skia, and compositing it to a BGRX window. This triggers a nasty readback for the fallback implementation deep in Cairo. It sees that the depth of the source and dest don't match, and decides to readback the dest window so that it can implement a blend. However, we never really needed a blend here anyway, since the surface really should just be handled as BGRX, except for the compositor itself, which needs to see it as BGRA.

To get past this stand-off, we create the gfxImageSurface as BGRX, and then create the temporary DrawTarget for Lock()ing it as BGRA. This avoids hitting the fallback since Commit() now is just drawing BGRX onto BGRX as far as Cairo is concerned.
Assignee: nobody → lsalzman
Status: NEW → ASSIGNED
Attachment #8966461 - Flags: review?(jmuizelaar)
Attachment #8966461 - Flags: review?(jmuizelaar) → review+
Pushed by lsalzman@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c6860625a6fd
always composite 24 depth WindowSurfaceX11 as BGRX. r=jrmuizel
https://hg.mozilla.org/mozilla-central/rev/c6860625a6fd
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Can you verify this patch fixes the issue for you?
Flags: needinfo?(gideon425xx)
Can you clarify how the fix works?
Does it implement locking to avoid the simultaneous use of the error handler or does it
avoid the X error altogether?
Flags: needinfo?(gideon425xx)
(In reply to gideon425xx from comment #7)
> Can you clarify how the fix works?
> Does it implement locking to avoid the simultaneous use of the error handler
> or does it
> avoid the X error altogether?

It avoids the error. Can you please test it?
Flags: needinfo?(gideon425xx)
Test done. This fixes it
Flags: needinfo?(gideon425xx)
Comment on attachment 8966461 [details] [diff] [review]
always composite 24 depth WindowSurfaceX11 as BGRX

Approval Request Comment
[Feature/Bug causing the regression]: bug 1285561
[User impact if declined]: Crashes if Firefox is run on X11 (Linux/BSD/etc.) without XShm support, and bad performance when not crashing.
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no 
[List of other uplifts needed for the feature/fix]:
[Is the change risky?]: No
[Why is the change risky/not risky?]: Doesn't introduce new behavior. Just essentially reverts back to expected blitting path in Cairo for the compositor.
[String changes made/needed]: none
Attachment #8966461 - Flags: approval-mozilla-beta?
Comment on attachment 8966461 [details] [diff] [review]
always composite 24 depth WindowSurfaceX11 as BGRX

Bit scary, but I guess not crashing is good :)  approved for 60.0b12
Attachment #8966461 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Setting qe-verify- based on Lee's assessment on manual testing needs (Comment 10) and the fact that this fix has automated tests.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.