Closed Bug 1754685 Opened 3 years ago Closed 3 years ago

Terrible scrolling performance on Canvas LMS homepage

Categories

(Core :: Graphics: WebRender, defect)

Firefox 99
defect

Tracking

()

VERIFIED FIXED
99 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox97 --- unaffected
firefox98 --- unaffected
firefox99 --- verified

People

(Reporter: noszalyaron4, Assigned: gw)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0

Steps to reproduce:

  1. Using latest nightly on Linux with proprietary NVidia drivers
  2. Visiting a Canvas LMS instance's homepage
  3. Scroll

Actual results:

Scrolling was very laggy.
Here's a profiling: https://share.firefox.dev/34M402a

Expected results:

No laggy scrolling on such a "simple" site.

The Bugbug bot thinks this bug should belong to the 'Core::Canvas: 2D' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Canvas: 2D
Product: Firefox → Core

I've reduced the issue to the little boxes of lectures and the transparent background of the "Dashboard" text ("Vezérlőpult" in hungarian), it seems their blending makes the rendering long. Though its most definitely not a minimal working example.

I have bisected it with mozregression-gui and found that this commit caused the regression:

2022-02-10T10:19:34.507000: INFO : Narrowed integration regression window from [8e9c1ffd, 41535e3d] (3 builds) to [8e9c1ffd, b7f88e5c] (2 builds) (~1 steps left)
2022-02-10T10:19:34.514000: DEBUG : Starting merge handling...
2022-02-10T10:19:34.514000: DEBUG : Using url: https://hg.mozilla.org/integration/autoland/json-pushes?changeset=b7f88e5c537bb7d64ba9d4cb58b052650d510638&full=1
2022-02-10T10:19:34.515000: DEBUG : redo: attempt 1/3
2022-02-10T10:19:34.515000: DEBUG : redo: retry: calling _default_get with args: ('https://hg.mozilla.org/integration/autoland/json-pushes?changeset=b7f88e5c537bb7d64ba9d4cb58b052650d510638&full=1',), kwargs: {}, attempt #1
2022-02-10T10:19:34.516000: DEBUG : urllib3.connectionpool: Resetting dropped connection: hg.mozilla.org
2022-02-10T10:19:35.787000: DEBUG : urllib3.connectionpool: https://hg.mozilla.org:443 "GET /integration/autoland/json-pushes?changeset=b7f88e5c537bb7d64ba9d4cb58b052650d510638&full=1 HTTP/1.1" 200 None
2022-02-10T10:19:35.822000: DEBUG : Found commit message:
Bug 1749380 - Improve how WR handles bounding rects for off-screen surfaces r=gfx-reviewers,kvark

This patch introduces a number of subtle but important changes
to how we deal with off-screen surfaces. The overall goals are:

 - Improve rendering correctness in a number of edge cases.
 - Begin reducing complexity related to surfaces, scaling
   factors, surface size adjustments and clipping.
 - Improve CPU performance by removing some per-primitive work.
 - Simplify implementation of future SVG and CSS filters by
   having explicit support for picture rects + inflation regions.
 - Lay the groundwork for caching child picture surfaces,
   reduction of per-primitive work during visibility pass,
   simplifying picture code.

Unfortunately, the nature of the changes make it impossible to
split up in to small isolated patches. Details below:

* Introduce `LocalRectKind` concept. This allows us to separate
  out the bounding rect of the surface (a group of primitives
  backed by a texture) from the bounding rect of the picture
  compositing that surface (e.g. a drop-shadow which draws the
  surface once at the local origin and once at a specific offset
  + blur-radius). This fixes a number of correctness bugs we have
  related to culling, clipping, invalidation regions of complex
  primitives such as drop-shadows and blur filters. Importantly,
  it makes it simpler to implement (or fix) SVG filter chains,
  backdrop-filter implementations.

* Establish raster roots for all off-screen surfaces. Every off-screen
  surface uses the spatial node of the enclosing stacking context as
  a coordinate system root, ensuring that each off-screen surface is
  drawn in a 2D coordinate system, with appropriate scaling factors
  applied to ensure high quality rendering. The primary goal is to make
  it possible to correctly inflate and clip off-screen surfaces, removing
  some correctness issues we currently have with complex filters interacting
  with transforms. The initial work here doesn't reduce complexity a huge
  amount, but will allow us to simplify large parts of the picture/surface
  handling code in future, as well as simplify a number of shaders that
  currently must handle arbitrarily complex transform matrices. This will
  also allow us to simplify the implementation of features such as
  mix-blend-mode and backdrop-filter, which rely on readback and UV mapping
  from the parent surface.

* Remove concepts of `estimated` and `precise` local rects for pictures. This
  is both a performance optimization and a code simplification. Instead, we
  only determine the estimated local rect during bounding rect propagation,
  and rely on the clipping regions from the tile dirty regions to reduce which
  parts of the picture we allocate if drawing to an off-screen surface. This
  removes some per-primitive work during the visibility pass, and also means
  we can rely on the final picture bounding rect from the start of the visibility
  pass. This also removes much of the complexity in `take_context` where we
  previously determined surface scale factors and device pixel ratio - instead
  these can be determined earlier during `propagate_bounding_rects`.

* Remove some complexity in `update_prim_visibility`. This is still recursive,
  but follow up patches will aim to remove this recursion and integrate this
  pass with the picture graph (similar to how `propagate_bounding_rects` works).

* Remove `PictureOptions` struct. Instead, store `inflate_if_required` with
  the Blur filter enum, which is the only place that uses it.

* Remove `root_scaling_factor` from text runs - this is handled implicitly
  by the surface device-pixel scale.

* Skip calling `update_clip_task` for pass-through pictures (since they have
  no defined local rect).

* Improve scaling factors used for determining the render task cache size for
  complex line decorations.

Differential Revision: https://phabricator.services.mozilla.com/D137569
Component: Canvas: 2D → Graphics: WebRender
Regressed by: 1749380

There's definitely a bug related to that patch, which I'm working on this week. However, the profile shows the time spent is in software webrender, which seems unexpected. Could you post the contents of your about:support?

Flags: needinfo?(noszalyaron4)
Attached file about:support
Oh you're right, last time I checked it wasn't Webrender (software):

OMG, enabling it in about:config fixes the scrolling on canvas entirely and brought back 144hz scrolling which I didn't remember losing but now I can't believe I could again :D

Here's the profiling now, it seems to coincide with my perception that everything seems normal: https://share.firefox.dev/3Jq86w1

Flags: needinfo?(noszalyaron4)

Seems like this is mostly a swgl perf regression from bug 1749380 in that case, which hopefully Glenn's follow up work will address.

But there's still the question of why you were using software webrender rather than regular webrender.

@noszalyaron4 could you please try running mozregression again, and this time checking whether whether it says Webrender (= good) or Webrender (software) (= bad).

Thanks!

Blocks: wr-perf
Severity: -- → S3
Flags: needinfo?(noszalyaron4)

Set release status flags based on info from the regressing bug 1749380

Has Regression Range: --- → yes
Assignee: nobody → gwatson

Minimal test case:

<!DOCTYPE html>
<html>
  <head>
    <style type="text/css">
      .ic-DashboardCard {
        border-radius: 4px;
        overflow: hidden;
        width: 262px;
      }
    </style>

  </head>
  <body>
    <div class="ic-DashboardCard">
      <svg viewBox="0 0 1920 1920" style="width: 1em; height: 1em;" width="1em" height="1em">
        <g>
          <path d="M677.647 16v338.936h112.941V129.054h1016.47V919.53h-225.994v259.765L1321.412 919.53h-79.172V467.878H0v1016.47h338.71v418.9l417.996-418.9h485.534v-451.877h32.753l419.125 419.124v-419.124H1920V16H677.647zM338.79 919.563h564.706v-112.94H338.79v112.94zm0 225.883h338.936v-113.054H338.79v113.054zM112.94 580.706h1016.47v790.701H710.4L451.652 1631.06v-259.652h-338.71V580.706z">
          </path>
        </g>
      </svg>
    </div>
  </body>
</html>

In this case, performance isn't terrible since the test case is small. However, WR is still creating a 3072 x 3072 render target here, which seems clearly incorrect.

When running the complete test case above, there are > 10 of those large targets, which will explain the performance issue there.

It seems like the scale factor determination is incorrect - we have a stacking context that is 1920 x 1920 in local coords due to the SVG box, with a 16x16 image. The scale transform of 0.00833 isn't being applied in a way to reduce the surface size.

The problem is due to ensuring that we always draw a surface with a minimum scale of 1.0 [1] which can cause very large surfaces in cases like this, where it's not necessary. I'll do some try runs without that min_scale and we'll find a better way to handle the cases that break without this.

[1] https://searchfox.org/mozilla-central/source/gfx/wr/webrender/src/picture.rs#5550

@jnicol:
Yeah, I've found this commit which reasonably explains why sw webrender was used :D Guess I will update my driver.

2022-02-15T10:20:27.171000: INFO : Narrowed integration regression window from [61fba772, e947de18] (3 builds) to [a85a8dbb, e947de18] (2 builds) (~1 steps left)
2022-02-15T10:20:27.175000: DEBUG : Starting merge handling...
2022-02-15T10:20:27.175000: DEBUG : Using url: https://hg.mozilla.org/integration/autoland/json-pushes?changeset=e947de18d5805af4d75a43ff1fc4a561a2fed3b5&full=1
2022-02-15T10:20:27.175000: DEBUG : redo: attempt 1/3
2022-02-15T10:20:27.175000: DEBUG : redo: retry: calling _default_get with args: ('https://hg.mozilla.org/integration/autoland/json-pushes?changeset=e947de18d5805af4d75a43ff1fc4a561a2fed3b5&full=1',), kwargs: {}, attempt #1
2022-02-15T10:20:27.176000: DEBUG : urllib3.connectionpool: Resetting dropped connection: hg.mozilla.org
2022-02-15T10:20:28.380000: DEBUG : urllib3.connectionpool: https://hg.mozilla.org:443 "GET /integration/autoland/json-pushes?changeset=e947de18d5805af4d75a43ff1fc4a561a2fed3b5&full=1 HTTP/1.1" 200 None
2022-02-15T10:20:28.399000: DEBUG : Found commit message:
Bug 1742994 - Bump required nvidia driver version for HW-WR to 470.82 (i.e. always use EGL), r=aosmond

We currently require `460.32.03`, however the `460` driver series got
succeeded by `470` LTS one, which supports the same hardware.
Thus all users can (and should) already have updated to `470.82`.
Distros like Ubuntu appear to ship the latest drivers as security
updates to all supported versions.

Bumping the requirement to `470.82` has the great advantage of being
in sync with the required version for EGL, i.e. all Nvidia users
with HW-WR will use EGL by default (minus bugs like bug 1739611).
This solves a couple of Nvidia+GLX related bugs at the price of
downgrading some users with outdated/unsupported drivers to SW-WR.
For these users, SW-WR likely gives a more stable experience anyway.
If they want the performance benefits of HW-WR, all they need to do
is to update their drivers (again, something they should do anyway
for security reasons).

Differential Revision: https://phabricator.services.mozilla.com/D132162
Flags: needinfo?(noszalyaron4)

Oh now that I look at it isn't 470.103.01>470.82.0.0? Though I'm not an expert at versions.

Thanks @noszalyaron4! I agree, I would have thought that 470.103 is more recent than 470.82. Perhaps a bug in our blocklisting logic. I'll file a new issue for that, to keep it separate from this bug which Glenn is using for the poor performance when using SWGL.

In the meantime if you keep gfx.webrender.all as true in about:config you should continue to get hardware acceleration.

See Also: → 1755441

Fixed by backout. I will be re-landing this patch series next week, with changes that fix this issue. Please re-open if it occurs again.

Status: UNCONFIRMED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 99 Branch
Flags: qe-verify+

I managed to reproduce this issue on Linux x86_64(Ubuntu 20.04) on Nightly 99.0a1(20220209095640). Confirmed as fixed on Firefox 99.0b6(20220320185956) and Nightly 100.0a1(20220321214243) on the Linux x86_64(Ubuntu 20.04), macOS 11 and Win10 64-bit.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: