Closed Bug 1674618 (sw-wr-perf-brush-yuv) Opened 4 years ago Closed 3 years ago

Scrolling https://www.pixellot.tv/is slow with sw-wr

Categories

(Core :: Graphics: WebRender, defect)

defect

Tracking

()

RESOLVED FIXED
88 Branch
Tracking Status
firefox87 --- fixed
firefox88 --- fixed

People

(Reporter: jrmuizel, Assigned: lsalzman)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 1 obsolete file)

A lot of time is being spent in the fast paths so I expect we might be doing something bad at a higher level.
https://share.firefox.dev/3kL885P

Blocks: sw-wr-perf
Summary: Scrolling https://www.pixellot.tv/is slow sw-wr → Scrolling https://www.pixellot.tv/is slow with sw-wr
Severity: -- → S3

Investigating...

Assignee: nobody → bwerth

https://atlas-rogues.com is another site with exceptionally slow scrolling. I'll profile that as well and if it has a meaningfully different profile, open a new bug for it.

(In reply to Brad Werth [:bradwerth] from comment #2)

https://atlas-rogues.com is another site with exceptionally slow scrolling. I'll profile that as well and if it has a meaningfully different profile, open a new bug for it.

Profiling this site shows all the time being spent in cs_clip_image_TEXTURE_2D_frag::run. That's very different from the profile for the pixellot.tv site. I'll open a new bug as I understand how to character this new problem.

Optimized Nightly profile of www.pixellot.tv scrolling shows that top contributors are:

  • 24% brush_image_ALPHA_PASS_frag::swgl_drawSpanRGBA8
  • 15% brush_mix_blend_ALPHA_PASS_frag::brush_fs
  • 12% brush_opacity_ALPHA_PASS_frag::swgl_drawSpanRGBA8
  • 12% brush_image_ALPHA_PASS_TEXTURE_2D_frag::swgl_drawSpanRGBA8
    and since it's an optimized build, there's essentially no more detailed breakdown.

A debug Nightly profile of www.pixellot.tv scrolling shows more details, but skews the contribution numbers. Focusing on the largest contributor in release, brush_image_ALPHA_PASS_frag::swgl_drawSpanRGBA8 shows two major sub-contributors:

  • 40% glsl::textureLinearUnpackedRGBA8
  • 35% blend_pixels_RGBA8
    I'm not sure which of these are considered slow or fast paths, but I'll see if I can find anything to speed up any of these functions.
Depends on: sw-wr-perf-nearest
Attached file svg data-icon file-video.html (obsolete) —

The slow scrolling on pixellot.tv seems to be triggered by the use of SVG paths. This testcase is an isolated example from the site, from the data-icon=file-video svg.

Analyzing the testcase with PRINT_TIMINGS on in release wrench, the timing data from the first captured slow frame is:

 20.734ms draw(composite, 4): 1937408 pixels in 1892 rows (avg 1024.000000 pixels/row, 10.701992ns/pixel)
  5.892ms draw(composite, 2): 468980 pixels in 716 rows (avg 655.000000 pixels/row, 12.562553ns/pixel)
  2.597ms draw(composite, 1): 366592 pixels in 358 rows (avg 1024.000000 pixels/row, 7.085236ns/pixel)
  2.045ms draw(composite, 1): 366592 pixels in 358 rows (avg 1024.000000 pixels/row, 5.578810ns/pixel)
  2.482ms draw(composite, 1): 444416 pixels in 434 rows (avg 1024.000000 pixels/row, 5.585960ns/pixel)
  5.498ms draw(composite, 3): 794844 pixels in 1458 rows (avg 545.160494 pixels/row, 6.916766ns/pixel)
  6.060ms draw(composite, 4): 1719296 pixels in 2048 rows (avg 839.500000 pixels/row, 3.524591ns/pixel)
  0.275ms draw(composite, 2): 11456 pixels in 716 rows (avg 16.000000 pixels/row, 24.039368ns/pixel)
  0.614ms draw(composite, 1): 91840 pixels in 140 rows (avg 656.000000 pixels/row, 6.688055ns/pixel)
Finish

with subsequent frames being similar except being a bit faster with the first two draws completing in about half the time as they did on this first frame.

Comment on attachment 9190682 [details]
svg data-icon file-video.html

This testcase shows a real problem, but it's not the problem that manifests on the pixellot.tv site. Bug 1680821 has been opened to address the problem demonstrated by this testcase.

Attachment #9190682 - Attachment is obsolete: true
Attached file skew-blend-scroll.html

This testcase shows a greatly reduced example of what's slowing down the pixellot.tv site. The slowdown occurs when scrolling through a section of the page where a very large div has been skewed and is using mix-blend-mode: multiply. On the pixellot.tv site, the corresponding design element is white and is layered behind other elements, so it's not obvious that there is a large skewed, blended div in the background.

Wrench invocation cargo run --release --features "software" -- --software show ~/wr-capture/ showing PRINT_TIMINGS from the testcase, first frame only:

 25.764ms draw(composite, 6): 2163712 pixels in 2873 rows (avg 753.119387 pixels/row, 11.907307ns/pixel)
  3.502ms draw(composite, 2): 606920 pixels in 825 rows (avg 735.660606 pixels/row, 5.769454ns/pixel)
  1.714ms draw(composite, 1): 320512 pixels in 313 rows (avg 1024.000000 pixels/row, 5.348851ns/pixel)
  4.327ms draw(composite, 3): 1183744 pixels in 1536 rows (avg 770.666667 pixels/row, 3.655763ns/pixel)
  4.101ms draw(composite, 1): 135168 pixels in 512 rows (avg 264.000000 pixels/row, 30.338993ns/pixel)
  6.743ms draw(composite, 2): 1048576 pixels in 1024 rows (avg 1024.000000 pixels/row, 6.430601ns/pixel)
  2.126ms draw(composite, 2): 328286 pixels in 358 rows (avg 917.000000 pixels/row, 6.475457ns/pixel)
Finish

Here's an updated profile: https://share.firefox.dev/3jRPF7Y frame times are down from 80-90ms to 30-40ms

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME

Let's keep this open as performance is still not great.

Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---

It looks like the top of the pixellot.tv page draws a YUV video in the background which has to punt to brush_yuv_image. We could do better here by utilizing the strategies from bug 1692731 to speed up the upscaling filter for YUV textures like I did for RGBA ones...

Depends on: sw-wr-perf-linear
Alias: sw-wr-perf-brush-yuv

Templating the color space in the YUV converter, while optimal, does make it harder to reuse
these routines between compositing and shader sampling. After some profiling, it seems possible
to get the compiler to generate relative addressing modes for SSE constants such that they are
as fast as directly addressing constant memory. This allows us to have a table-driven YUVMatrix
instead that removes the switch()y-ness in favor of just looking up an array that doesn't impact
performance.

Initial experiments with rigging up the new YUV upscaling routines to the shader seem to give a drastic speedup for the video in the page background. I will do some further work here to try to share some more code between them.

Assignee: bwerth → lsalzman
Keywords: leave-open
Pushed by lsalzman@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/19d4e9493ee0
un-template YUVConverter's color space to make it easier to reuse. r=jrmuizel
No longer regressions: 1693924

I noticed that Google ads with YUV video can also have the same problem as noticed here, in that they hit brush_yuv_image and spend a lot of time there.

Attachment #9204734 - Attachment description: Bug 1674618 - reuse CompositeYUV routines to accelerate blendYUV. r?jrmuizel → Bug 1674618 - Reuse CompositeYUV routines to accelerate blendYUV. r?jrmuizel
Keywords: leave-open
Pushed by lsalzman@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b70b2ce3a2d9
Reuse CompositeYUV routines to accelerate blendYUV. r=jrmuizel

Comment on attachment 9204734 [details]
Bug 1674618 - Reuse CompositeYUV routines to accelerate blendYUV. r?jrmuizel

Beta/Release Uplift Approval Request

  • User impact if declined: Trying to do partial SW-WR rollout on Linux. Would like to avoid people reporting significant performance regressions months down the line.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Only impacts SW-WR/Linux. We can abort the rollout if unstable. Adequate time to deal with any bugs.
  • String changes made/needed:
Attachment #9204734 - Flags: approval-mozilla-beta?
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 88 Branch
Blocks: 1674944

Comment on attachment 9204734 [details]
Bug 1674618 - Reuse CompositeYUV routines to accelerate blendYUV. r?jrmuizel

Approved for 87.0b3.

Attachment #9204734 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: