Open Bug 1743051 Opened 3 years ago Updated 24 days ago

MOZ_X11_EGL/Nvidia: Partly broken partial present after suspend&resume

Categories

(Core :: Graphics: WebRender, defect)

x86_64
Linux
defect

Tracking

()

Tracking Status
firefox95 --- disabled
firefox96 --- disabled
firefox97 --- disabled
firefox98 --- verified disabled

People

(Reporter: jan, Unassigned)

References

(Blocks 3 open bugs)

Details

(Keywords: correctness, nightly-community)

Attachments

(4 files)

(Darkspirit from bug 1731172 comment 35)

Some tiles in the vertical middle are not updated correctly. It can be best seen when hovering lines on about:config.
gfx.webrender.allow-partial-present-buffer-age=false doesn't help, only gfx.webrender.max-partial-present-rects=0 helps.

Attached video 2021-11-26 02-29-32.mp4
Attached video 2021-11-26 02-41-20.mp4

The same with Gnome partial present debug view enabled (steps to enable: bug 1640858 comment 12):
Everything is displayed correctly in this mode, but the red area shows what would be changed with this mode disabled.

Those lines on about:config that didn't get a blue background on hovering are not within the red box.

This bug seems to occur only after suspend&resume on EGL/Nvidia and persists until Firefox restart.

bug 1712969 is about an existing cross-platform partial present bug and has multiple testcases. Could it be related?

Thanks, that's very interesting. So according to the video from comment 2, it looks like WR updates the buffer correctly but fails to report the correct damage rect in SwapBuffersWithDamage() (note: we currently always report only one combined rect, bug 1640712, so it's unlikely that this is wrongly handled by the driver). And comment 4 points to us not properly combining the damage for multiple tiles.

To me it looks like the culprit must be somewhere between the part where we calculate the damage region from the tile rects in calculate_dirty_rects(), called in draw_frame() here, and passing it down later in draw_frame() via composite_frame()->composite_simple()->set_buffer_damage_region(). Actually I think it has to be in the linked part in calculate_dirty_rects(). I'll try to come up with a build with some debug logging.

xrender suffered from a bug where main menu changes were delayed by one frame.
nvidia + gpu process + i3 seemed to have the same problem 2 months ago: bug 1733094 comment 14.

my naive/incompetent thought:

this bug: Main window might fall back from XShmPutImage to buggy (?) XPutImage on NV suspend&resume?

Hm, we only use that if we fall back to SW-WR, it shouldn't matter for HW-WR at all AFAIK.

When you have time, could you try https://treeherder.mozilla.org/jobs?repo=try&revision=ab4d61c02f60fc953b387cb58843c386a707aa8c and check if the debug output changes? Sorry for still not being able to do it myself :(

(In reply to Darkspirit from comment #6)
This bug was filed with X11-only driver 470.86. I will test 495 with MOZ_ENABLE_WAYLAND and report back.

Jan, just to make sure I'm on the right track, can you confirm that the following build is not affected? https://treeherder.mozilla.org/jobs?repo=try&revision=b9607af2acbc9b5959fddab42b0af29ac642f8c1

(In reply to Robert Mader [:rmader] from comment #7)

When you have time, could you try https://treeherder.mozilla.org/jobs?repo=try&revision=ab4d61c02f60fc953b387cb58843c386a707aa8c and check if the debug output changes? Sorry for still not being able to do it myself :(

Count looks the same before and after.

(In reply to Darkspirit from comment #8)

This bug was filed with X11-only driver 470.86. I will test 495 with MOZ_ENABLE_WAYLAND and report back.

Gnome Wayland/495 stable is somehow broken: Logging in to Wayland does not work: "Failed to grab modeset ownership": https://www.google.com/search?client=firefox-b-d&q=Failed+to+grab+modeset+ownership
Updating to Ubuntu 22.04dev did not help.

(In reply to Robert Mader [:rmader] from comment #9)

Jan, just to make sure I'm on the right track, can you confirm that the following build is not affected? https://treeherder.mozilla.org/jobs?repo=try&revision=b9607af2acbc9b5959fddab42b0af29ac642f8c1

Gnome X11/driver 495: Yes, that build is fine.
Regular Nightly:
(In reply to Darkspirit from comment #0)

gfx.webrender.allow-partial-present-buffer-age=false doesn't help, only gfx.webrender.max-partial-present-rects=0 helps.

Can you also test https://treeherder.mozilla.org/jobs?repo=try&revision=31a13646f62781744aa8c9c564bdeeebd7dbddc3 (and maybe do a quick video with the Gnome partial present debug view)?

Attached video try_comment11.mp4
Severity: -- → S3
See Also: → 1744664
See Also: 1744664

Still reproducible with Gnome X11, Nvidia GTX 1060, driver 495.46, Ubuntu 22.04 jammy.

The update from 495.44 to 495.46 seems to have fixed Gnome Wayland for me. Previously it was somehow not working anymore.
Xwayland window content is displayed, but still uses llvmpipe.
Gnome Wayland suffers from https://gitlab.gnome.org/GNOME/mutter/-/issues/1942 (=texture corruption all over the place), the partial present bug does not seem to occur there (yet?).

Has anyone had the chance to check if the 510 driver series is also affected by this?

Just got a report on Matrix that this still happens on the 510 driver series.

Depends on: 1751252
No longer blocks: 1737428

Presumably bug 1751252 nuked this?
(if any, I would wonder if the total blanket ban of partial present would still make sense with current drivers)

One would need to try :)
FWIW: the block only applies on X11 where very few apps use EGL, partly because Mesa drivers are broken there (don't support transparency). So I wouldn't bet on NVidia devs having spend much time fixing this issue.

On Wayland, where most development happens anyway these days, things seem to work fine AFAIK.

See Also: → 1882319

I have also been having this problem for a long time. Currently using Firefox 126.0 on X11 with nvidia 550 driver. Slackware64-current.
I used to be able to close and reopen (with CTRL+SHIFT+T) affected tabs and then they would be OK again, but since recently I need to completely exit and restart Firefox to fix it. Possibly a change in FF 124 or 125?

I have tested with gfx.x11-egl.force-disabled but the problem still occurs using the GLX driver. Setting gfx.webrender.software to true seems to be the only fix.

Some examples of the problem (sorry no screenshots) are, after resume
Grafana graphs will not show any of the lines, only the tooltip when hovering the mouse over. Bar charts will be unfilled when they should be filled.
Frigate NVR will not display the latest camera snapshot on the main page 90% of the time when reloading.
Strava maps will only show the town names but none of the actual map detail.

This problem can also be reproduced by switching to a text console using CTRL+ALT+F1 and then back to the X11 console.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: