Closed Bug 1708416 Opened 4 years ago Closed 4 years ago

[wayland] Implement n-buffering for the software backend (Basic/SWWR)

Categories

(Core :: Widget: Gtk, enhancement)

Product:

Component:

Version:

Other Branch

Type:

enhancement

Priority:

Not set

Severity:

--

Tracking

()

Status:

RESOLVED FIXED

Milestone:

91 Branch

Tracking Flags:

Tracking

Status

firefox91

---

fixed

People

(Reporter: rmader, Assigned: rmader)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

Bug 1708416 - Implement n-buffering for the Wayland software backend, r?stransky,lsalzman 4 years ago Robert Mader [:rmader] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1708416 - Implement n-buffering for the Wayland software backend, r=stransky 4 years ago Robert Mader [:rmader] 48 bytes, text/x-phabricator-request		Details \| Review

Robert Mader [:rmader]

Assignee

Description

•

4 years ago

Following up on bug 1693472 comment 39 and some matrix discussions: Traditionally software rendering in FF was single buffered - fast and who cares about tearing.

For Wayland, in order to fully comply with the spec and at the same time stay fast, we should optimally implement double buffering. Optimally we can reuse EGL infrastructure here, which already has partial damage support for this case.

Martin Stránský [:stransky] (ni? me)

Comment 1

•

4 years ago

I'd rather wait here until we really need it. It means significant performance loss while page scrolling and video playback and generally an small screen updates line menu rendering.

Vlad Zahorodnii [:zzag]

Comment 2

•

4 years ago

If you keep a journal with last N surface damage regions and assign a buffer age or something like that to every buffer, there won't be huge performance regressions.

Robert Mader [:rmader]

Assignee

Comment 3

•

4 years ago

(In reply to Vlad Zahorodnii [:zzag] from comment #2)

If you keep a journal with last N surface damage regions and assign a buffer age or something like that to every buffer, there won't be huge performance regressions.

Yes, agree with that. Theoretically there should only be overhead when the damage region over the last 2-3 frames changed, which is not the case for scrolling or fullscreen video. For that cases we should be able to have equal performance, if implemented right.

Robert Mader [:rmader]

Assignee

Comment 4

•

4 years ago

Martin: just FYU, I had a longer chat with Lee Salzman and Markus Stange at #gfx-firefox about this. Apparently CA has similar limitations, but it's not clear if we can reuse their solution. In any case we can make SWWR work for this case and likely reuse the EGL damage tracking logic. Basic is about to be abandoned soon anyway, so no reason to worry about that.

Robert Mader [:rmader]

Assignee

Comment 5

•

4 years ago

Attached file Bug 1708416 - Implement n-buffering for the Wayland software backend, r?stransky,lsalzman — Details

In order to fully comply with the spec and work on all compositors.
This is somewhat inspired by SurfacePoolCA, moving buffers between
buckets - in use or available. The idea is to make it easy to expand
this further to share a common buffer pool between multiple surfaces.

It works similar to EGL buffer handling and reuses the partial damage
logic in WR by implementing buffer age.

Notable changes to the current implementation:

always draw directly into shm-buffer memory - no caching.
drawing is purely driven by the compositor / the VsyncSource.
No frame callbacks or threads apart from buffer release
callbacks.
Buffer objects are construct only, i.e. no resizing. This will
hopefully simplify buffer sharing between surfaces.

One of the goals here is to handle different compositor behaviour
efficiently - if a compositor releases buffers early, we want to
reuse a single buffer whenever possible. If a compositor holds buffers
longer, we want to minimize the area we need to redraw to older buffers.
This is archived by always using the last released buffer. The expected
usual buffer age is 1 in the fast release case and 2-3 in the
holding case.

This changes buffer handling quite significantly, so propper testing
is due. However, as the overall architecture is somewhat simpler than
before, it may improve stability in the long term.

Phabricator Automation

Updated

•

4 years ago

Assignee: nobody → robert.mader

Status: NEW → ASSIGNED

Robert Mader [:rmader]

Assignee

Comment 6

•

4 years ago

Here's a try build: https://treeherder.mozilla.org/jobs?repo=try&revision=bfde80fd01d54484f3890a52998c40744cc86c7b

So far I only tested on Mutter with buffer holding enabled or disabled. First test are encouraging, however I still experience certain crashes which might need some additional changes in other parts of the code.

Martin: This would replace a lot of code you worked on a lot with a quite different approach - so I'd love to your opinion/thoughs if this looks reasonable to you.

Robert Mader [:rmader]

Assignee

Updated

•

4 years ago

See Also: → 1693472

Robert Mader [:rmader]

Assignee

Updated

•

4 years ago

Depends on: 1709606

Robert Mader [:rmader]

Assignee

Updated

•

4 years ago

Summary: [wayland] Implement double buffering for the software backend (Basic/SWWR) → [wayland] Implement n-buffering for the software backend (Basic/SWWR)

Phabricator Automation

Updated

•

4 years ago

Attachment #9220347 - Attachment description: Bug 1708416 - RFC: Implement n-buffering for the Wayland software backend, r?stransky,lsalzman → Bug 1708416 - Implement n-buffering for the Wayland software backend, r?stransky,lsalzman

Robert Mader [:rmader]

Assignee

Comment 7

•

4 years ago

I did a more extensive round of testing with the latest revision. Crashes are gone now as far as I can see, because of better refcounting. I compared two optimized but non-release builds with and without the patch (1, 2). On my main mashine (Thinkpad T460p, skylake) I tested all major compositors (Mutter with and without buffer-holding, KWin, Sway and Weston), on an old Thinkpad T400 I only tested stock Mutter. Each time I tested both Basic and SWWR and checked sides like testufo.com and vsynctester.com.

So far my findings are:

performance appears to be pretty much equal as far as reported FPS etc. are concerned
on the T400, I had the subjective impression that sometimes scrolling felt smoother with the old/current implementation, both on SWWR and basic. I'm don't know yet why that could be and if it's really the case - maybe the old implementation skips frames sometimes in smarter way?
unrelated to this patch, the combination of SWWR, Wayland and Kwin was heavily stuttering on testufo.com, both on the old and new backend. It doesn't happen with the Basic or HWWR compositor (it's doing great there), with the Xwayland backend or any other compositor. Also, it doesn't appear to be related to frame callbacks (WaylandVsyncSource). This needs investigation in another bug - maybe I'm just hitting very unfortunate timings or so.

1: https://treeherder.mozilla.org/jobs?repo=try&revision=4fc6cc5b6e9145b92f50d379039e6a1cf862a862&selectedTaskRun=GYm046RnQHGIPxSckEBXJw.0
2: https://treeherder.mozilla.org/jobs?repo=try&revision=56a78262fd441ead241ae4ef4deead4515bc71cd&selectedTaskRun=QBqdy_oqQpWvZK3ypzD6RQ.0

Vlad Zahorodnii [:zzag]

Comment 8

•

4 years ago

the combination of SWWR, Wayland and Kwin was heavily stuttering on testufo.com

There are a couple of reasons why testufo.com stutters when running kwin

(a) with SWWR, the entire Firefox window is damaged on every frame. With the Basic renderer, only some parts of the window are damaged, therefore there are less pixels to upload to the gpu
(b) by default, kwin doesn't start compositing right after a vblank. usually, it will perform compositing somewhere around the middle of the vblank interval. As far as I know, most compositors don't do it (at least, not by default). KWin has a mechanism to move compositing closer to vblank, but in this instance, it fails. Ideally, data from wl_shm buffers needs to be uploaded to opengl textures asynchronously, which is on our todo list. In meanwhile, you can change the latency policy in system settings to "Force smoothest animations."

Vlad Zahorodnii [:zzag]

Comment 9

•

4 years ago

(c) kwin uploads data from wl_shm buffers to opengl textures during compositing cycle :/, not when a buffer is attached to a surface

Robert Mader [:rmader]

Assignee

Comment 10

•

4 years ago

(In reply to Vlad Zahorodnii [:zzag] from comment #8)

(a) with SWWR, the entire Firefox window is damaged on every frame. With the Basic renderer, only some parts of the window are damaged, therefore there are less pixels to upload to the gpu

Oof, indeed, WRs partial damage logic really doesn't like that site :(

Robert Mader [:rmader]

Assignee

Updated

•

4 years ago

See Also: → 1710120

Robert Mader [:rmader]

Assignee

Updated

•

4 years ago

Depends on: 1710532

Robert Mader [:rmader]

Assignee

Updated

•

4 years ago

Depends on: 1711094

Robert Mader [:rmader]

Assignee

Comment 11

•

4 years ago

New try build: https://treeherder.mozilla.org/jobs?repo=try&revision=169eae108d81a3e614e4594ca20ba7dcfc8d63f6

Robert Mader [:rmader]

Assignee

Comment 12

•

4 years ago

That didn't go well, here's a new one: https://treeherder.mozilla.org/jobs?repo=try&revision=10317c75ac8ccb8d15f8edbad30393b399f02900

Robert Mader [:rmader]

Assignee

Comment 13

•

4 years ago

And another one, fixing a trivial crash. Conveniently it confirmed that at least on SW-WR, a buffer release callback can arrive after the WindowSurface was otherwise unreffed (or, before bug 1709606, explicitly destroyed), making us access freed memory under certain circumstances.

https://treeherder.mozilla.org/jobs?repo=try&revision=e6084a335971bf4a98417bd3939a9fa95ac325e0

Robert Mader [:rmader]

Assignee

Comment 14

•

4 years ago

Some more observations from testing this on my Thinkpad T400:

frame drops seem to heavily depend on when compositor settings. The device is heavily bandwidth limited, so fullscreen updates apparently often don't make it. This is for example visible on Weston, which by default starts compositing at the 7ms mark while Mutter usually at 2ms. Using the default settings of 7ms, both backends are similar. When changing it to 2ms, to make it behave more similar to Mutter, things go in favor of the old backend. IIUC this is because the new backend is "dumper", i.e. drawing and submitting frames is both driven by frame callback WaylandVsyncSource. The old backend uses WaylandVsyncSource for drawing, but has its own frame callbacks for actually attaching and commiting the surface, potentially caching things in between. From my understanding, the old backend, while sometimes working better on compositors like Mutter, does kinda work around a compositor issue. I.e. compositors should optimize callback times smarter. In conclusion I think the new backend behaves more predictable and how compositors would clients assume to behave, and compositors should be able to make it behave similar well.
On my main mashine, both backends appear to perform very similar.
With the old backed, I occasional observed freezes on SW-WR when e.g. playing videos for a longer time. The new backend appears to very solid, from all I can see, which due to it's lower complexity wouldn't be too surprising.

Robert Mader [:rmader]

Assignee

Comment 15

•

4 years ago

This just became more important/delicate after bug 1711118 landed, which would make this the default.

See Also: → 1711118

Robert Mader [:rmader]

Assignee

Updated

•

4 years ago

See Also: → 1694038

Comment 16

•

4 years ago

Pushed by robert.mader@posteo.de: https://hg.mozilla.org/integration/autoland/rev/ee3c15c9dba5 Implement n-buffering for the Wayland software backend, r=stransky

Natalia Csoregi [:nataliaCs]

Comment 17

•

4 years ago

Backed out for non-unified failure on RenderCompositorSWGL.cpp

backout: https://hg.mozilla.org/integration/autoland/rev/d7166b8e97a98a98e5d3a5df6f975051dcd47c38

push: https://treeherder.mozilla.org/jobs?repo=autoland&selectedTaskRun=aXabMa7CQdaGGRLBZkRa4Q.0&revision=ee3c15c9dba52306591473b3261e3bfce266cd71

failure log: https://treeherder.mozilla.org/logviewer?job_id=340022121&repo=autoland&lineNumber=7558

[task 2021-05-18T13:43:06.337Z] 60:42.08 /builds/worker/fetches/clang/bin/clang++ -isysroot /builds/worker/fetches/MacOSX10.12.sdk -std=gnu++17 --target=x86_64-apple-darwin -o /dev/null -c -fvisibility=hidden -fvisibility-inlines-hidden -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -ftrivial-auto-var-init=pattern -DDEBUG=1 -DOS_POSIX=1 -DOS_MACOSX=1 -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -DSTATIC_EXPORTABLE_JS_API -I/builds/worker/checkouts/gecko/gfx/webrender_bindings -I/builds/worker/checkouts/gecko/obj-x86_64-apple-darwin/gfx/webrender_bindings -I/builds/worker/checkouts/gecko/obj-x86_64-apple-darwin/ipc/ipdl/_ipdlheaders -I/builds/worker/checkouts/gecko/ipc/chromium/src -I/builds/worker/checkouts/gecko/ipc/glue -I/builds/worker/checkouts/gecko/obj-x86_64-apple-darwin/dist/include -I/builds/worker/checkouts/gecko/obj-x86_64-apple-darwin/dist/include/nspr -I/builds/worker/checkouts/gecko/obj-x86_64-apple-darwin/dist/include/nss -DMOZILLA_CLIENT -include /builds/worker/checkouts/gecko/obj-x86_64-apple-darwin/mozilla-config.h -Qunused-arguments -Qunused-arguments -Wall -Wbitfield-enum-conversion -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wshadow-field-in-constructor-modified -Wsign-compare -Wtype-limits -Wunreachable-code -Wunreachable-code-return -Wwrite-strings -Wno-invalid-offsetof -Wclass-varargs -Wempty-init-stmt -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wloop-analysis -Wno-range-loop-analysis -Wc++2a-compat -Wcomma -Wimplicit-fallthrough -Wunused-function -Wunused-variable -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-enum-zero-compare -Wtautological-unsigned-zero-compare -Wno-error=tautological-type-limit-compare -Wno-inline-new-delete -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=backend-plugin -Wno-error=free-nonheap-object -Wno-error=return-std-move -Wno-error=atomic-alignment -Wno-error=deprecated-copy -Wformat -Wformat-security -Wno-gnu-zero-variadic-macro-arguments -Wno-psabi -Wno-unknown-warning-option -fno-sized-deallocation -fno-aligned-new -fno-exceptions -fno-strict-aliasing -stdlib=libc++ -fPIC -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -g -O3 -fno-omit-frame-pointer -funwind-tables -I/builds/worker/checkouts/gecko/obj-x86_64-apple-darwin/dist/include/cairo -Werror=switch -ferror-limit=0 /builds/worker/checkouts/gecko/gfx/webrender_bindings/RenderEGLImageTextureHost.cpp -fsyntax-only
[task 2021-05-18T13:43:07.130Z] /builds/worker/checkouts/gecko/gfx/webrender_bindings/RenderCompositorSWGL.cpp:263:15: error: no member named 'gfxVars' in namespace 'mozilla::gfx'
[task 2021-05-18T13:43:07.130Z]   return gfx::gfxVars::WebRenderMaxPartialPresentRects();
[task 2021-05-18T13:43:07.131Z]          ~~~~~^
[task 2021-05-18T13:43:07.280Z] 1 error generated.

Flags: needinfo?(robert.mader)

Natalia Csoregi [:nataliaCs]

Comment 18

•

4 years ago

Also failing several webdriver tests: https://treeherder.mozilla.org/jobs?repo=autoland&selectedTaskRun=aWZXALyCRXWmx1GWF0NIZQ.0&searchStr=linux%2C18.04%2Cx64%2Casan%2Copt%2Cweb%2Cplatform%2Ctests%2Ctest-linux1804-64-asan%2Fopt-web-platform-tests-wdspec-headless-e10s%2Cwdh1&revision=ee3c15c9dba52306591473b3261e3bfce266cd71
and
https://treeherder.mozilla.org/jobs?repo=autoland&selectedTaskRun=eAnDeWgHQae1h59zfBZEQQ.0&revision=84eb62a713ed2d8405ba800a9c930bac26d6cc0f&searchStr=wdh&group_state=expanded

Robert Mader [:rmader]

Assignee

Comment 19

•

4 years ago

Thanks, will look into it.

Flags: needinfo?(robert.mader)

Robert Mader [:rmader]

Assignee

Comment 20

•

4 years ago

So the approach here is currently on hold because it turned out that SW-WR uses the "native" compositor paths and thus can't easily reuse the buffer-age based paths of the "draw" compositor. Fixing the case here is most likely equal to solving partial damage for compositor integration (bug 1617498).

So far I assumed that there's only one solution for this problem, which is to make WR repaint the union of the current and previous dirty rects. However, it turns out that the CoreAnimation backend instead just blits back the needed areas from the previous frame[1]. In case of Wayland, a submitted/attached buffer is considered "locked" and the spec is not super clear about whether we are allowed to blit from an attached buffer:

Committing a pending wl_buffer allows the compositor to read the pixels in the wl_buffer. The compositor may access the pixels at any time after the wl_surface.commit request. When the compositor will not access the pixels anymore, it will send the wl_buffer.release event.[2]

In a quick chat with emersion at #wayland we concluded that the spec should be clarified - with a tendency to only allow reading from the compositor, and not possible stuff like in-place mutations which get reversed before buffer release. This would allow us to blit/memcopy the missing parts. I'll investigate if this is a feasible solution.

1: https://phabricator.services.mozilla.com/D51760
2: https://wayland.freedesktop.org/docs/html/apa.html

Robert Mader [:rmader]

Assignee

Updated

•

4 years ago

See Also: → 1713202

Robert Mader [:rmader]

Assignee

Comment 21

•

4 years ago

•

The approach described above, blitting from the previously submitted buffer, seems to work pretty well. That's great news as it:

allows us handle everything purely in the surface backend
thus also works for Basic
is pretty fast:
- doesn't have any overhead (apart from a few calculations) while scrolling or fullscreen updates
- doesn't have any overhead on compositor that quickly release buffers again

Will post a new patch soon. This also opens the door to have a single backend again, instead of maintaining two.

Robert Mader [:rmader]

Assignee

Comment 22

•

4 years ago

Attached file Bug 1708416 - Implement n-buffering for the Wayland software backend, r=stransky — Details

In order to fully comply with the spec and work on all compositors.
This is heavily inspired by NativeLayerCA:

Buffer are managed within a pool, similar to SurfacePool. This
makes sure there's always a buffer available, even if the compositor
holds buffers for longer than usually expected. One difference is
that buffers are not shared between windows though, reducing
complexity.
Partial damage is handled by blitting from the previous buffer,
likely currently held by the Wayland compositor. While the spec is
not completely clear whether this is legal, it strongly suggests it
is - and other Wayland developers suggested it should be.
This is almost identical to NativeLayerCA::HandlePartialUpdate,
with a small optimization for the common case of double buffering,
in witch case use a "damage history", inspired by the EGL partial
damage support. As single or double buffering are by far the most
common cases for shm-buffers, we limit this to double buffering to
avoid complexity.

Changes compared to the existing WindowSurfaceWayland include:

reduced complexity:
- less code
- no extra frame callbacks, avoiding extra steps like D117911
no compositor specific modes - this solution should always produce
fully correct output with minimal overhead.

Note: a previous version of this patch only worked for SW-WR but not
Basic and thus did not replace WindowSurfaceWayland. Now this patch
supports both, making it possible to fully replace WindowSurfaceWayland

Robert Mader [:rmader]

Assignee

Comment 23

•

4 years ago

•

~~https://treeherder.mozilla.org/jobs?repo=try&revision=7acaa12a06e15b144cfb022ded26e3064979027c~~
https://treeherder.mozilla.org/jobs?repo=try&revision=622fc15c840f50d6d1b49bfb3d8f21e2a3e7aef8

Try job which always damages the whole buffer, making it easy to check if contents are really valid.

Robert Mader [:rmader]

Assignee

Comment 24

•

4 years ago

I gave the build from the comment above some testing on Gnome or KDE. It always does full buffer damage and thus bypasses the optional damage hint, forcing the compositor to always show the full buffer content. Comparing the old and new backend shows that the old one often produces glitches, thus violating the spec. This may or may not be visible on a given compositor - on Mutter it currently would never show.
The glitches are especially visible on KDE with Basic, because of its different architecture. With the new backend I wasn't able to produce any glitches, neither on SW-WR nor Basic, on neither Gnome or KDE. Performance wise I wasn't able to spot a difference, but I also didn't make any measurements yet.

I think that speaks in favor of replacing the old with the new backend - Martin, opinions here?

Flags: needinfo?(stransky)

Phabricator Automation

Updated

•

4 years ago

Attachment #9227434 - Attachment description: WIP: Bug 1708416 - Implement n-buffering for the Wayland software backend → Bug 1708416 - Implement n-buffering for the Wayland software backend, r=stransky

Robert Mader [:rmader]

Assignee

Comment 25

•

4 years ago

Another try build, this time without any debug stuff: https://treeherder.mozilla.org/jobs?repo=try&revision=87f00af7f6095b406d5211f7a317042913554ce6

The new backend is enabled by default, both on SW-WR and Basic. The old one can be activated by setting widget.wayland.software-backend:1

Martin Stránský [:stransky] (ni? me)

Comment 26

•

4 years ago

Sure, I'll look at it.

Flags: needinfo?(stransky)

Martin Stránský [:stransky] (ni? me)

Comment 27

•

4 years ago

(In reply to Robert Mader [:rmader] from comment #24)

I think that speaks in favor of replacing the old with the new backend - Martin, opinions here?

Yes, we can replace it when it's tested enough in nightly and/or mozilla test suite with Wayland backend.

Comment 28

•

4 years ago

Pushed by robert.mader@posteo.de: https://hg.mozilla.org/integration/autoland/rev/f98734017be4 Implement n-buffering for the Wayland software backend, r=stransky

Cristian Brindusan [:cbrindusan]

Comment 29

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/f98734017be4

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox91: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 91 Branch

Robert Mader [:rmader]

Assignee

Comment 30

•

4 years ago

FTR: the new backend has now landed, but for is disabled by default - it can be enabled via the setting widget.wayland.multi-buffer-software-backend.enabled.

In theory it should/could perform better on compositors like Kwin and maybe on Gnome as well, but I haven't been able to verify this - and there may be further optimization potential. In any case it should be more correct - testing welcome :)

Concerning issues on sites like testufo.com and vsynctester.com (Comment 8): this has been fixed in bug 1713686, unfortunately only for Wayland.

See Also: → 1713686

You need to log in before you can comment on or make changes to this bug.