Closed Bug 1549099 Opened 5 years ago Closed 3 years ago

[Wayland][WebRender] Firefox sometimes falls back to OpenGL compositing when extension popup window is opened

Categories

(Core :: Graphics: WebRender, defect, P3)

Unspecified
Linux
defect

Tracking

()

RESOLVED DUPLICATE of bug 1681107
Tracking Status
firefox68 --- affected
firefox75 --- affected
firefox86 --- fixed

People

(Reporter: viktor_jaegerskuepper, Unassigned)

References

(Blocks 2 open bugs)

Details

OS: Arch Linux
Desktop environment: GNOME

Even after the fixes for bug 1514156 and bug 1532024, I sometimes get an empty window, but only rarely. In the terminal I see:

Attempting load of libEGL.so
IPDL protocol Error: Received an invalid file descriptor
[GFX1-]: window is null
[GFX1-]: Failed to create EGLSurface
IPDL protocol Error: Received an invalid file descriptor
IPDL protocol Error: Received an invalid file descriptor

Regarding my standard setup, I use:
privacy.resistFingerprinting = true
browser.startup.blankWindow = false (as a workaround for bug 1535565)
browser.privatebrowsing.autostart = true

I don't know if these settings are related to this bug. Maybe this should be a blocker for the fingerprinting-breakage bug 1507517.

The size of the empty window seems to be the correct one, i.e. reduced according to the resistFingerprinting feature.

I can't reproduce it with a clean profile, neither with only gfx.webrender.all set to true nor with additionally privacy.resistFingerprinting set to true, although I started and closed Nightly about 30 times in a row for both configurations.

Blocks: wayland, wr-linux
Component: Graphics → Graphics: WebRender

I just noticed that during the last session (with correctly displayed window content) the following appeared in the terminal:

[GFX1-]: window is null
[GFX1-]: Failed to create EGLSurface

During that session I clicked on the NoScript icon several times, and one time the corresponding popup below the icon wasn't shown. I couldn't reproduce this by repeated clicking on the NoScript icon so far.

I don't know if this is actually related, so just tell me if I should file another bug report for this.

you're probably describing Bug 1547277

(In reply to Alexis Beingessner [:Gankro] from comment #2)

you're probably describing Bug 1547277

This is a different bug, but I want to make sure that the NoScript bug is real and related to the GFX1 output in the terminal. Maybe I'm wrong about the NoScript thing and something else led to the "window is null" message.

I see quite often an empty window in Firefox. Normally I use Firefox in a maximized windows on a fractionally scaled GNOME Wayland Fedora 30 desktop. When I unmaximize the Firefox window, and maximize it again, when I have a blank screen, the content will be rendered.

https://phabricator.services.mozilla.com/D30282 introduces some better error handling in BeginFrame(), which could in a debug build result in a panic in WebRender code as BeginFrame() would succeed and allow rendering to continue despite no EGL surface being created.

This might potentially be what causes the blank windows if the error occurs early enough

If anyone wishes to try, I have set over a try build here which should be done soon: https://treeherder.mozilla.org/#/jobs?repo=try&searchStr=gecko%2Cdecision%2Ctask%2Copt%2Caction-callback%2Caction%3A%2Crun%2Call%2Ctalos%2Ctests%2Cac%28rat%29&revision=80310e3a225cdaadaa5f00dcbdd63b4bd3b949f8

Priority: -- → P3

I do see that too. It regularly happens for popup windows, when new profile is created, remote addon popup is show and so on.

In debug build i get:
Hit MOZ_CRASH(Caught GL error 506 at clear) at gfx/wr/webrender/src/device/gl.rs:1215

which indicates the framebuffer is not created successfully.

Status: UNCONFIRMED → NEW
Ever confirmed: true

Main process bt:
#0 0x00007fa09ebcb4d5 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1 0x0000559084523ca7 in mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl&) (this=0x7fa077e7c8a8, lock=...)
at /home/komat/tmp676-trunk-gtk3/src-wayland/mozglue/misc/ConditionVariable_posix.cpp:109
#2 0x0000559084523d80 in mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&)
(this=0x7fa077e7c8a8, lock=..., a_rel_time=...) at /home/komat/tmp676-trunk-gtk3/src-wayland/mozglue/misc/ConditionVariable_posix.cpp:116
#3 0x00007fa091e77324 in mozilla::OffTheBooksCondVar::Wait(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>) (this=0x7fa077e7c888, aDuration=...)
at /home/komat/tmp676-trunk-gtk3/src-wayland/xpcom/threads/BlockingResourceBase.cpp:561
#4 0x00007fa091ea64f9 in mozilla::Monitor::Wait(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>) (this=0x7fa077e7c840, aDuration=...)
at /home/komat/tmp676-trunk-gtk3/src-wayland/objdir/dist/include/mozilla/Monitor.h:37
#5 0x00007fa09292c4cb in mozilla::ipc::MessageChannel::WaitForSyncNotify(bool) (this=0x7fa077c33120) at /home/komat/tmp676-trunk-gtk3/src-wayland/ipc/glue/MessageChannel.cpp:2346
#6 0x00007fa09292b8c7 in mozilla::ipc::MessageChannel::Send(IPC::Message*, IPC::Message*) (this=0x7fa077c33120, aMsg=0x7fa0776e7300, aReply=0x7ffc25b92218)
at /home/komat/tmp676-trunk-gtk3/src-wayland/ipc/glue/MessageChannel.cpp:1537
#7 0x00007fa0929396a2 in mozilla::ipc::IProtocol::ChannelSend(IPC::Message*, IPC::Message*) (this=0x7fa077c34400, aMsg=0x7fa0776e7300, aReply=0x7ffc25b92218)
at /home/komat/tmp676-trunk-gtk3/src-wayland/ipc/glue/ProtocolUtils.cpp:575
#8 0x00007fa092a4de82 in mozilla::layers::PCompositorBridgeChild::SendFlushRendering() (this=0x7fa077c34400) at PCompositorBridgeChild.cpp:1003
#9 0x00007fa093a3b1c6 in mozilla::layers::CompositorBridgeChild::SendFlushRendering() (this=0x7fa077c34400)
at /home/komat/tmp676-trunk-gtk3/src-wayland/gfx/layers/ipc/CompositorBridgeChild.cpp:750
#10 0x00007fa09389d499 in mozilla::layers::WebRenderLayerManager::FlushRendering() (this=0x7fa077eca000)
at /home/komat/tmp676-trunk-gtk3/src-wayland/gfx/layers/wr/WebRenderLayerManager.cpp:708
#11 0x00007fa096d5adca in nsViewManager::Refresh(nsView*, mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel> const&) (this=0x7fa077e0ed80, aView=0x7fa077e11280, aRegion=...)
at /home/komat/tmp676-trunk-gtk3/src-wayland/view/nsViewManager.cpp:336
#12 0x00007fa096d590bf in nsViewManager::PaintWindow(nsIWidget*, mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel> const&) (this=0x7fa077e0ed80, aWidget=0x7fa07e21e800, aRegion=...)
at /home/komat/tmp676-trunk-gtk3/src-wayland/view/nsViewManager.cpp:692
#13 0x00007fa096d58faf in nsView::PaintWindow(nsIWidget*, mozilla::gfx::IntRegionTyped<mozilla::LayoutDevicePixel>) (this=0x7fa077e11280, aWidget=0x7fa07e21e800, aRegion=...)
at /home/komat/tmp676-trunk-gtk3/src-wayland/view/nsView.cpp:997
#14 0x00007fa096defd48 in nsWindow::OnExposeEvent(_cairo*) (this=0x7fa07e21e800, cr=0x7fa078bd8000) at /home/komat/tmp676-trunk-gtk3/src-wayland/widget/gtk/nsWindow.cpp:2228
#15 0x00007fa096dfda55 in draw_window_of_widget(_GtkWidget*, _GdkWindow*, _cairo*) (widget=0x7fa07b41c530, aWindow=0x7fa09e4c6c70, cr=0x7fa078bd8000)
at /home/komat/tmp676-trunk-gtk3/src-wayland/widget/gtk/nsWindow.cpp:5483
#16 0x00007fa096dfa6fd in expose_event_cb(_GtkWidget*, _cairo*) (widget=0x7fa07b41c530, cr=0x7fa078bd8000) at /home/komat/tmp676-trunk-gtk3/src-wayland/widget/gtk/nsWindow.cpp:5504

Render process bt:
crashes at pub fn clear_target(), throws the error. I can't get clear bt from gdb.

Looks like widget bug, mContainer is not created yet:

children = 0x0,
surface = 0x0,
subsurface = 0x0,
eglwindow = 0x0,
frame_callback_handler = 0x0,
frame_callback_handler_surface_id = -1,
surface_needs_clear = 1,
ready_to_draw = 1,

Assignee: nobody → stransky
Component: Graphics: WebRender → Widget: Gtk

Seems to be WR bug after all. There's a simple reproducer:

  1. Install any extension which have large popup thus WR is used to draw it. I use gecko-profiler.
  2. Launch FF with WR enabled
  3. Open extension popup (gecko-profiler). With Bug 1565583 checked-in you get empty window without content drawn.

I can reproduce it reliably in debug builds but it needs Bug 1565583 to get there.

I did some debugging and looks like the fix from Bug 1532024 does not reliably work for remote content. nsWindow::WaylandEGLSurfaceForceRedraw() is correctly called when mozcontainer is ready, even CompositorWidgetDelegate->RequestsUpdatingEGLSurface(); is processed but it does not redraw the remote content in popup frame.

Sotaro, any idea here?

Assignee: stransky → nobody
Component: Widget: Gtk → Graphics: WebRender
Flags: needinfo?(sotaro.ikeda.g)
Summary: Wayland sometimes (rarely) displays an empty window with WebRender and MOZ_ENABLE_WAYLAND=1 → [Wayland][WebRender] FF displays an empty popup window for extensions

It seems that there are several problems around RequestsUpdatingEGLSurface(). Bug 1565785 and Bug 1566468 addresses some problems. Can you check if the patches of Bug 1565785 and Bug 1566468 address the problem of this bug? Bug 1565785 is already in m-c.

Flags: needinfo?(sotaro.ikeda.g)

It's still here and it's clearly visible in debug mode with gecko-profile add-on. First opening of the add-on leads to empty window, any subsequent one shows correct content.

During the last start I got this on the command line:

Attempting load of libEGL.so
[GFX1-]: window is null
[GFX1-]: Failed to create EGLSurface
[GFX1-]: We don't have EGLSurface to draw into. Called too early?
[GFX1-]: Compositors might be mixed (5,2)

But the window content was displayed normally. In the graphics part of about:support I see the following :

Compositing: OpenGL
WebRender:
opt-in by default: WebRender is an opt-in feature
available by user: Force enabled by pref
unavailable by runtime: Failed to create new surface

And the lines of the terminal output are shown under "error protocol".

I could reproduce what Martin sees with the gecko-profiler add-on, but I was using a standard Nightly build, and in the console I got:

[GFX1-]: We don't have EGLSurface to draw into. Called too early?
[GFX1-]: We don't have EGLSurface to draw into. Called too early?

It seems I can't get an empty main window at launch any more because there seems to be a fallback to OpenGL compositing, which I could trigger several times. Note that I have a very old and "slow" CPU (Intel Core 2 Duo).

(In reply to Viktor Jägersküpper from comment #13)

I could reproduce what Martin sees with the gecko-profiler add-on, but I was using a standard Nightly build, and in the console I got:

[GFX1-]: We don't have EGLSurface to draw into. Called too early?
[GFX1-]: We don't have EGLSurface to draw into. Called too early?

That's result of Bug 1565583 where we add a check of it so the failure is reported now.

After starting from the command line I clicked on the NoScript icon and Firefox crashed with signature "mozilla::layers::CompositorBridgeParent::ResumeComposition". If it helps, this is the report:

https://crash-stats.mozilla.org/report/index/9e286790-f45d-4d7f-844b-7e4880190724

(In reply to Viktor Jägersküpper from comment #15)

After starting from the command line I clicked on the NoScript icon and Firefox crashed with signature "mozilla::layers::CompositorBridgeParent::ResumeComposition". If it helps, this is the report:

https://crash-stats.mozilla.org/report/index/9e286790-f45d-4d7f-844b-7e4880190724

Thanks for reporting it! This is a regression of Bug 1565785. I am not sure if it is related to this bug. Bug 1568748 is created for it.

See Also: → 1568748

(In reply to Sotaro Ikeda [:sotaro] from comment #16)

Thanks for reporting it! This is a regression of Bug 1565785. I am not sure if it is related to this bug. Bug 1568748 is created for it.

The crash happened on non-WebRender widget. The crash seemed not related to this bug.

I wonder if Bug 1565785 and Bug 1588987 might address the problem.

(In reply to Sotaro Ikeda [:sotaro] from comment #18)

I wonder if Bug 1565785 and Bug 1588987 might address the problem.

Unfortunately the bug is still there for me.

I found out that this bug is now fixed for me, so I used mozregression to find out why. It turned out that two changes were required: After bug 1542808 the NoScript popup often wasn't displayed and I got "We don't have EGLSurface to draw into. Called too early?", but the fallback to OpenGL rendering was gone. And after bug 1597861 the popup was always displayed properly. So I'm closing this bug as WORKSFORME.

There are however two remaining problems:
I can still (but rarely) get "Failed to connect WebRenderBridgeChild.", e.g. when I click on the NoScipt icon repeatedly while the Youtube homepage is loading.

This is why I tried a recent debug build from Mozilla using mozregression (2019-12-12), and I could easily get Firefox to crash by clicking on the NoScript icon. I will report this crash later because I want to try to find a regression window for this crash first. Maybe the crash is related to the other issue.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME

I filed bug 1603835 for the crash with the debug build.

See Also: → 1603835

During a session today (using Nightly) I noticed that Firefox started to behave strangely, e.g. the web content didn't scroll while I was using the scrollbar, the position on the page just changed when I released the mouse button (a bit like using the Page Up/Down keys, but not smooth at all). So I closed Firefox and saw these messages in the terminal:

[GFX1-]: window is null
[GFX1-]: Failed to create EGLSurface
[GFX1-]: We don't have EGLSurface to draw into. Called too early?
[GFX1-]: We don't have EGLSurface to draw into. Called too early?
[GFX1-]: Compositors might be mixed (5,2)

During the new session the same happened again after a few minutes and I saw that compositing changed to OpenGL in about:support. I assume that clicking on the icon of NoScript or uBlock Origin led to this behaviour because I don't know what else could be the reason. So it looks like the root cause for this bug is still not fixed, although Firefox works reliably for me with Wayland+WebRender most of the time.

I am changing the title to something more appropriate.

Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Summary: [Wayland][WebRender] FF displays an empty popup window for extensions → [Wayland][WebRender] Firefox sometimes falls back to OpenGL compositing when extension popup window is opened

If it fell back to OpenGL, I assume you have layers.acceleration.force-enabled set to true? The GPU process may have crashed. Anything interesting in about:crashes?

Flags: needinfo?(viktor_jaegerskuepper)

I have not set layers.acceleration.force-enabled to true. Concerning graphics related settings, layers.acceleration.disabled is set to false (default) and I have only set gfx.webrender.all to true. If I am not mistaken, this is the correct way to enable WebRender. about:crashes does not list a crash related to this bug and never has (not including the particular crashes that I have reported, see under "See also").

I have tried to reproduce this bug today (which took a while), when I succeeded, this is what happened in detail:

  • I changed tabs (to wiki.mozilla.org)
  • I clicked on the NoScript icon (mozilla.org and mozilla.net have "script" set to allowed) and the popup window opened
    -> the web content disappeared
  • I clicked several times on the blank space until the content was displayed again
  • about:support shows
    Compositing: OpenGL (instead of WebRender)
    Decision log:
    GPU_PROCESS: blocked by runtime: Wayland does not work in the GPU process
    WEBRENDER: opt-in by default: WebRender is an opt-in feature
    available by user: Force enabled by pref
    unavailable by runtime: Failed to create new surface
    WEBRENDER_QUALIFIED: denied by env: Not on allowlist
    WEBRENDER_COMPOSITOR: disabled by default: Disabled by default
    WEBGPU: disabled by default: Disabled by default
    Error log:
    (#0) CP+[GFX1-]: Failed to connect WebRenderBridgeChild.
    (#4) CP+[GFX1-]: Failed to connect WebRenderBridgeChild.
    (#5) CP+[GFX1-]: Failed to connect WebRenderBridgeChild.
    (#6) CP+[GFX1-]: Failed to connect WebRenderBridgeChild.
    (#7) CP+[GFX1-]: Failed to connect WebRenderBridgeChild.
    (#8) CP+[GFX1-]: Failed to connect WebRenderBridgeChild.
    (#9) CP+[GFX1-]: Failed to connect WebRenderBridgeChild.
    (#10) CP+[GFX1-]: Failed to connect WebRenderBridgeChild.
    (#11) CP+[GFX1-]: Failed to connect WebRenderBridgeChild.
    (#12) CP+[GFX1-]: Failed to connect WebRenderBridgeChild.
    (#13) CP+[GFX1-]: Failed to connect WebRenderBridgeChild.
    (#14) Error window is null
    (#15) Error Failed to create EGLSurface
    (#16) Error We don't have EGLSurface to draw into. Called too early?
    (#17) Error We don't have EGLSurface to draw into. Called too early?
    (#18) Error Compositors might be mixed (5,2)

Please ignore the "Failed to connect WebRenderBridgeChild" messages, they were logged before that as I was trying to reproduce this bug, I have looked into that and will handle it in another bug when I have some time.

Here are some further observations of "strange behaviour" which I am now experiencing while I am in the "mixed compositor" state or whatever:

  • the scrollbar does not move when I scroll (by clicking on it and holding the mouse button) until I release the mouse button again
  • the mouse pointer does not become a hand when I point onto a link
  • pointing onto links does not show the link in the lower left corner of the browser window
  • when I point the mouse onto a Youtube video (while it is playing), the control overlay does not appear
    -> when I click, the video playback stops and the overlay is displayed; after another click and waiting for some seconds the mouse pointer disappears (this is expected and by default), but I can only see it again when I move it outside the window (e.g. by moving it to the top bar of the Gnome shell)
  • when I want to select text (e.g. for copying later), it is only highlighted after I release the mouse button
  • actually - while I am typing and looking for issues - everything worked correctly again after I changed to a window of a different application via ALT-TAB, but I can make the issues reappear in the same way although I haven't found a reliable way to do this. In about:support Compositing is still OpenGL all the time.

Last, but not least, what I observe now is because of some change(s) which landed in May/June/July 2019 as you can see by reading the first comments (Martin was the first one to notice this change, I wasn't using the Wayland backend at that time). I will try to find that change(s) with mozregression, maybe it will help you to find the root cause.

Flags: needinfo?(viktor_jaegerskuepper)

This should have been fixed in bug 1681107 - please reopen if you still see it.

Status: REOPENED → RESOLVED
Closed: 4 years ago3 years ago
Resolution: --- → DUPLICATE

This has been fixed in bug 1681107 indeed. Thank you very much for fixing this annoying bug!

You need to log in before you can comment on or make changes to this bug.