Closed Bug 1650583 Opened 5 months ago Closed 3 months ago

[X11][EGL] Transparent window with proprietary Nvidia driver

Categories

(Core :: Graphics: WebRender, defect, P3)

x86_64
Linux
defect

Tracking

()

VERIFIED FIXED
82 Branch
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox77 --- unaffected
firefox78 --- unaffected
firefox79 --- unaffected
firefox80 --- disabled
firefox81 --- disabled
firefox82 --- verified

People

(Reporter: jan, Assigned: stransky)

References

(Blocks 2 open bugs)

Details

(Keywords: correctness, nightly-community)

Attachments

(11 files)

Only the proprietary Nvidia driver has this issue. Mesa/Nouveau is fine.

Surfman had the same problem and this PR fixed it: https://github.com/servo/surfman/pull/178

Avoid calling eglMakeCurrent prior to creating a window surface

Other suggestion:

I looked at a couple of other libraries that use EGL on various hardware, and another common approach to resolving this issue is to create a 1x1 pbuffer "dummy surface", and make current against that when you want to unlock your main surface.

Attached file nvidia.txt

MOZ_X11_EGL=1 mozregression --launch 2020-07-04 --pref gfx.webrender.all:true layers.gpu-process.enabled:false -B debug > nvidia.txt

0:38.43 INFO: b'[GLX] window 21d has VisualID 0x21'
0:38.77 INFO: b'Initializing context 0x7f13d157b6b1 surface (nil) on display 0x7f13d3ff35c0'
0:38.77 INFO: b'[6523, Renderer] WARNING: Failed to make GL context current!: file /builds/worker/checkouts/gecko/gfx/gl/GLContextProviderEGL.cpp, line 470'
0:38.77 INFO: b'EGL Error: 0x3000'
0:38.77 INFO: b'[6523, Renderer] WARNING: GLContext::InitWithPrefix failed!: file /builds/worker/checkouts/gecko/gfx/gl/GLContext.cpp, line 322'
0:38.77 INFO: b'Destroying context 0x7f13d157b6b1 surface (nil) on display 0x7f13d3ff35c0'
0:38.77 INFO: b'[GFX1-]: Failed to create EGLContext!: 0x3000'

Jan, could you try this build and post the output? https://treeherder.mozilla.org/#/jobs?repo=try&revision=d13b3816de849be44409c0f0cefae3564516e015

I wasn't yet able to reproduce a context initialization without surface on mesa/intel so far, the Initializing context 0x7f13d157b6b1 surface (nil) on display 0x7f13d3ff35c0 part.

the close window ( when many tabs are open) when you open it with amdgpu driver and EGL is transparent too until random actions like to make an screenshot when the windows stop to be transparent

(In reply to albertogomezmarin from comment #3)

the close window ( when many tabs are open) when you open it with amdgpu driver and EGL is transparent too until random actions like to make an screenshot when the windows stop to be transparent

Can confirm this one, too - however only whit Webrender. Can you confirm that?

(In reply to albertogomezmarin from comment #3)

the close window ( when many tabs are open) when you open it with amdgpu driver and EGL is transparent too until random actions like to make an screenshot when the windows stop to be transparent

I think this belongs to bug 1650246

Edit: I think so because I can reproduce on intel/mesa, too, without contexts being initialized without surface.

(In reply to Robert Mader [:rmader] from comment #4)

(In reply to albertogomezmarin from comment #3)

the close window ( when many tabs are open) when you open it with amdgpu driver and EGL is transparent too until random actions like to make an screenshot when the windows stop to be transparent

Can confirm this one, too - however only whit Webrender. Can you confirm that?

I think it can be, I haven't tested with NO webrender.

(In reply to Robert Mader [:rmader] from comment #5)

(In reply to albertogomezmarin from comment #3)

the close window ( when many tabs are open) when you open it with amdgpu driver and EGL is transparent too until random actions like to make an screenshot when the windows stop to be transparent

I think this belongs to bug 1650246

okey, im going to comment there

(In reply to Robert Mader [:rmader] from comment #2)
This try build doesn't have the changes from bug 1640048, it doesn't support MOZ_EGL_X11 and uses GLX.

(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #8)

(In reply to Robert Mader [:rmader] from comment #2)
This try build doesn't have the changes from bug 1640048, it doesn't support MOZ_EGL_X11 and uses GLX.

Err, sorry, still learning how to work with mercurial / hg and try pushes. This should work now: https://treeherder.mozilla.org/#/jobs?repo=try&revision=eef0d18bd5e2cf4679b9aa33e90ce4ff8fafd506

(In reply to Robert Mader [:rmader] from comment #9)
MOZ_X11_EGL=1 mozregression --repo try --launch eef0d18bd5e2cf4679b9aa33e90ce4ff8fafd506 --pref gfx.webrender.all:true layers.gpu-process.enabled:false -B debug > log-eef0d18bd5e2cf4679b9aa33e90ce4ff8fafd506.txt

0:52.61 INFO: b'Initializing context 0x7f63ffe786b1 surface (nil) (fallback surface (nil)) on display 0x7f6405fa67a0'
0:52.60 INFO: b'MakeCurrentImpl: Use fallback: (nil)'
Attached file debug.txt

MOZ_X11_EGL=1 mozregression --repo try --launch eef0d18bd5e2cf4679b9aa33e90ce4ff8fafd506 --pref gfx.webrender.all:true layers.gpu-process.enabled:false -B debug 2>&1 > debug.txt

Same.

I see some indications that this might have to do with WR - can you check what happens with GL / basic layers (including some WebGL test)?

p.s.: I'm afraid I'll need a device to test the nvidia proprietary driver going forward :(

MOZ_X11_EGL=1 mozregression --launch 2020-07-05 --pref gfx.webrender.force-disabled:true layers.gpu-process.enabled:false layers.acceleration.force-enabled:true -a about:support

Like it has been with Surfman, it's not related to WebRender, but to OpenGL contexts themselves.

Severity: -- → S3
Depends on: gfx-triage
Priority: -- → P3
Blocks: 1625070
See Also: → 1652310

EGL + proprietary Nvidia does neither work on Gnome Wayland (with EGLStreams).

[GFX1-]: Failed to create EGLSurface
[GFX1-]: We don't have EGLSurface to draw into. Called too early?
Gdk-Message: 14:38:16.467: Error 71 (Erro de protocolo) dispatching to Wayland display.

Blocks: 1646135

On Mesa/Gnome Wayland, WebRender falls back to OpenGL when you activate autoscroll or open an addon panel. It has a similar error message:

[GFX1-]: window is null
[GFX1-]: Failed to create EGLSurface
[GFX1-]: We don't have EGLSurface to draw into. Called too early?
[GFX1-]: Compositors might be mixed (5,2)

I don't know if this bug could fix these Wayland bugs, but wanted to mention it in case they are related.

Blocks: 1645677
Blocks: 1638084
No longer depends on: gfx-triage

Can someone with an nvidia card (Jan maybe?) do me a favour and paste the output of the following try build? I'm still lacking a nvidia machine but maybe we can find out what's wrong regardless.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=94479d73bea8d9b1e398a63901421f55e7ea57e9

MOZ_X11_EGL=1 mozregression --repo try --launch 94479d73bea8d9b1e398a63901421f55e7ea57e9 --pref gfx.webrender.all:true -P stdout

Attached file debug.txt

(In reply to Robert Mader [:rmader] from comment #16)

Can someone with an nvidia card (Jan maybe?) do me a favour and paste the output of the following try build? I'm still lacking a nvidia machine but maybe we can find out what's wrong regardless.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=94479d73bea8d9b1e398a63901421f55e7ea57e9

MOZ_X11_EGL=1 mozregression --repo try --launch 94479d73bea8d9b1e398a63901421f55e7ea57e9 --pref gfx.webrender.all:true -P stdout

Here you go.

(In reply to im.helloer from comment #17)

Here you go.

Thanks! Getting closer here - can you also post the output of https://treeherder.mozilla.org/#/jobs?repo=try&revision=bad202d75aa7bd33d7fefa518339a757fc4008a7 once it's ready?

Attached file debug2.txt

(In reply to Robert Mader [:rmader] from comment #18)

(In reply to im.helloer from comment #17)

Here you go.

Thanks! Getting closer here - can you also post the output of https://treeherder.mozilla.org/#/jobs?repo=try&revision=bad202d75aa7bd33d7fefa518339a757fc4008a7 once it's ready?

Thanks again. So surface creation fails with 0x3009. The spec (1) says:

EGL_BAD_MATCH: Arguments are inconsistent (for example, a valid context requires buffers not supplied by a valid surface).

We currently use the custom GLX based functionFindVisual (2) to set to the appropriate visual for the window. The following build uses a fallback path with gdk_screen_get_rgba_visual instead - this might avoid the issue and gets rid of more GLX specific code - on mesa it appears to work just as well, I'm just not sure about non-composited window managers. OTOH I'm also not sure if I should care :)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=25afa11020dc39c8651598b48b10e92c5496e7b1

If anyone could try and, whether it works or not, post the output - again that would be much appreciated.

1: https://www.khronos.org/registry/EGL/sdk/docs/man/html/eglGetError.xhtml
2: https://searchfox.org/mozilla-central/source/gfx/gl/GLContextProviderGLX.cpp#791

I'm working on it right now.

Assignee: nobody → stransky
No longer blocks: 1638084, 1645677, 1646135

When GLX Vsync source is created along EGL contexts, NVIDIA drivers refuse to make any EGL content current.
So disable GLX Vsync source creation when EGL context is used.

No longer blocks: 1625070

Implement GLContextEGL::FindVisual() as a EGL counterpart of GLContextGLX::FindVisual() used
by GLX.

We need to make sure that GdkWindow uses the same visual as GL framebuffer we use for it.
That was already implemented for GLX backend (Bug 1478454).

The visual match is implemented by visual parameter at CreateConfig()/CreateConfigScreen() routines and when it's non-zero,
try to find exact match based on visual ID.

Depends on D87635

Jeff, ping, any update here? The last one is waiting for review.
Thanks.

Flags: needinfo?(jgilbert)
See Also: → 1640779
Pushed by nerli@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bbb04547d367
[Linux/EGL] Use GLX Vsync source on GLX only, r=jgilbert
https://hg.mozilla.org/integration/autoland/rev/b24be6b2d8cd
[Linux/EGL] Log eglCreateWindowSurface failure, r=jgilbert
https://hg.mozilla.org/integration/autoland/rev/e6a03fea3aad
[Linux/EGL] Implement GLContextEGL::FindVisual(), r=jgilbert
https://hg.mozilla.org/integration/autoland/rev/033e491241b1
[Linux/EGL] Use GLContextEGL::FindVisual() in nsWindow::Create to set visual for GdkWindow, r=jhorak

Will look at it, Thanks.

Flags: needinfo?(stransky)
Flags: needinfo?(jgilbert)
Pushed by cbrindusan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/20b1b9a1db08
[Linux/EGL] Use GLX Vsync source on GLX only, r=jgilbert
https://hg.mozilla.org/integration/autoland/rev/dd2d6a667e9c
[Linux/EGL] Log eglCreateWindowSurface failure, r=jgilbert
https://hg.mozilla.org/integration/autoland/rev/915c19da6fce
[Linux/EGL] Implement GLContextEGL::FindVisual(), r=jgilbert
https://hg.mozilla.org/integration/autoland/rev/e9d534d12e77
[Linux/EGL] Use GLContextEGL::FindVisual() in nsWindow::Create to set visual for GdkWindow, r=jhorak
Regressions: 1663003

Since the status are different for nightly and release, what's the status for beta?
For more information, please visit auto_nag documentation.

Gnome X11, Debian Testing, Nvidia GTX 1060, driver 450.57
$ MOZ_X11_EGL=1 mozregression --launch 20200903151816 --pref gfx.webrender.all:true -a about:support
Verified fixed, thank you! :)

$ MOZ_X11_EGL=1 mozregression --launch 20200903151816 --pref gfx.webrender.all:true gfx.webrender.max-partial-present-rects:1 -a https://www.youtube.com/watch?v=LXb3EKWsInQ
I can't verify whether partial present works because Meta.add_clutter_debug_flags(0, Clutter.DrawDebugFlag.PAINT_DAMAGE_REGION, 0) (bug 1648872 comment 0) is less precise on X11. It worked two months ago with a small demo app (bug 1625070 comment 30).

Status: RESOLVED → VERIFIED
See Also: → 1663152
Regressions: 1669275
See Also: → 1677314
You need to log in before you can comment on or make changes to this bug.