Scrolling https://www.pixellot.tv/is slow with sw-wr
Categories
(Core :: Graphics: WebRender, defect)
Tracking
()
People
(Reporter: jrmuizel, Assigned: lsalzman)
References
(Blocks 1 open bug)
Details
Attachments
(3 files, 1 obsolete file)
A lot of time is being spent in the fast paths so I expect we might be doing something bad at a higher level.
https://share.firefox.dev/3kL885P
Reporter | ||
Updated•4 years ago
|
Updated•4 years ago
|
Comment 2•4 years ago
|
||
https://atlas-rogues.com is another site with exceptionally slow scrolling. I'll profile that as well and if it has a meaningfully different profile, open a new bug for it.
Comment 3•4 years ago
|
||
(In reply to Brad Werth [:bradwerth] from comment #2)
https://atlas-rogues.com is another site with exceptionally slow scrolling. I'll profile that as well and if it has a meaningfully different profile, open a new bug for it.
Profiling this site shows all the time being spent in cs_clip_image_TEXTURE_2D_frag::run
. That's very different from the profile for the pixellot.tv site. I'll open a new bug as I understand how to character this new problem.
Comment 4•4 years ago
|
||
Optimized Nightly profile of www.pixellot.tv scrolling shows that top contributors are:
- 24%
brush_image_ALPHA_PASS_frag::swgl_drawSpanRGBA8
- 15%
brush_mix_blend_ALPHA_PASS_frag::brush_fs
- 12%
brush_opacity_ALPHA_PASS_frag::swgl_drawSpanRGBA8
- 12%
brush_image_ALPHA_PASS_TEXTURE_2D_frag::swgl_drawSpanRGBA8
and since it's an optimized build, there's essentially no more detailed breakdown.
A debug Nightly profile of www.pixellot.tv scrolling shows more details, but skews the contribution numbers. Focusing on the largest contributor in release, brush_image_ALPHA_PASS_frag::swgl_drawSpanRGBA8
shows two major sub-contributors:
- 40%
glsl::textureLinearUnpackedRGBA8
- 35%
blend_pixels_RGBA8
I'm not sure which of these are considered slow or fast paths, but I'll see if I can find anything to speed up any of these functions.
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Updated•4 years ago
|
Comment 5•4 years ago
|
||
The slow scrolling on pixellot.tv seems to be triggered by the use of SVG paths. This testcase is an isolated example from the site, from the data-icon=file-video
svg.
Comment 6•4 years ago
|
||
Analyzing the testcase with PRINT_TIMINGS
on in release wrench, the timing data from the first captured slow frame is:
20.734ms draw(composite, 4): 1937408 pixels in 1892 rows (avg 1024.000000 pixels/row, 10.701992ns/pixel)
5.892ms draw(composite, 2): 468980 pixels in 716 rows (avg 655.000000 pixels/row, 12.562553ns/pixel)
2.597ms draw(composite, 1): 366592 pixels in 358 rows (avg 1024.000000 pixels/row, 7.085236ns/pixel)
2.045ms draw(composite, 1): 366592 pixels in 358 rows (avg 1024.000000 pixels/row, 5.578810ns/pixel)
2.482ms draw(composite, 1): 444416 pixels in 434 rows (avg 1024.000000 pixels/row, 5.585960ns/pixel)
5.498ms draw(composite, 3): 794844 pixels in 1458 rows (avg 545.160494 pixels/row, 6.916766ns/pixel)
6.060ms draw(composite, 4): 1719296 pixels in 2048 rows (avg 839.500000 pixels/row, 3.524591ns/pixel)
0.275ms draw(composite, 2): 11456 pixels in 716 rows (avg 16.000000 pixels/row, 24.039368ns/pixel)
0.614ms draw(composite, 1): 91840 pixels in 140 rows (avg 656.000000 pixels/row, 6.688055ns/pixel)
Finish
with subsequent frames being similar except being a bit faster with the first two draws completing in about half the time as they did on this first frame.
Comment 7•4 years ago
|
||
Comment on attachment 9190682 [details]
svg data-icon file-video.html
This testcase shows a real problem, but it's not the problem that manifests on the pixellot.tv site. Bug 1680821 has been opened to address the problem demonstrated by this testcase.
Comment 8•4 years ago
|
||
This testcase shows a greatly reduced example of what's slowing down the pixellot.tv site. The slowdown occurs when scrolling through a section of the page where a very large div has been skewed and is using mix-blend-mode: multiply
. On the pixellot.tv site, the corresponding design element is white and is layered behind other elements, so it's not obvious that there is a large skewed, blended div in the background.
Comment 9•4 years ago
|
||
Wrench invocation cargo run --release --features "software" -- --software show ~/wr-capture/
showing PRINT_TIMINGS
from the testcase, first frame only:
25.764ms draw(composite, 6): 2163712 pixels in 2873 rows (avg 753.119387 pixels/row, 11.907307ns/pixel)
3.502ms draw(composite, 2): 606920 pixels in 825 rows (avg 735.660606 pixels/row, 5.769454ns/pixel)
1.714ms draw(composite, 1): 320512 pixels in 313 rows (avg 1024.000000 pixels/row, 5.348851ns/pixel)
4.327ms draw(composite, 3): 1183744 pixels in 1536 rows (avg 770.666667 pixels/row, 3.655763ns/pixel)
4.101ms draw(composite, 1): 135168 pixels in 512 rows (avg 264.000000 pixels/row, 30.338993ns/pixel)
6.743ms draw(composite, 2): 1048576 pixels in 1024 rows (avg 1024.000000 pixels/row, 6.430601ns/pixel)
2.126ms draw(composite, 2): 328286 pixels in 358 rows (avg 917.000000 pixels/row, 6.475457ns/pixel)
Finish
Reporter | ||
Comment 10•3 years ago
|
||
Here's an updated profile: https://share.firefox.dev/3jRPF7Y frame times are down from 80-90ms to 30-40ms
Updated•3 years ago
|
Reporter | ||
Comment 11•3 years ago
|
||
Let's keep this open as performance is still not great.
Assignee | ||
Comment 12•3 years ago
|
||
It looks like the top of the pixellot.tv page draws a YUV video in the background which has to punt to brush_yuv_image. We could do better here by utilizing the strategies from bug 1692731 to speed up the upscaling filter for YUV textures like I did for RGBA ones...
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 13•3 years ago
|
||
Templating the color space in the YUV converter, while optimal, does make it harder to reuse
these routines between compositing and shader sampling. After some profiling, it seems possible
to get the compiler to generate relative addressing modes for SSE constants such that they are
as fast as directly addressing constant memory. This allows us to have a table-driven YUVMatrix
instead that removes the switch()y-ness in favor of just looking up an array that doesn't impact
performance.
Assignee | ||
Comment 14•3 years ago
|
||
Initial experiments with rigging up the new YUV upscaling routines to the shader seem to give a drastic speedup for the video in the page background. I will do some further work here to try to share some more code between them.
Assignee | ||
Updated•3 years ago
|
Comment 15•3 years ago
|
||
Pushed by lsalzman@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/19d4e9493ee0 un-template YUVConverter's color space to make it easier to reuse. r=jrmuizel
Comment 16•3 years ago
|
||
bugherder |
Assignee | ||
Comment 17•3 years ago
|
||
I noticed that Google ads with YUV video can also have the same problem as noticed here, in that they hit brush_yuv_image and spend a lot of time there.
Assignee | ||
Comment 18•3 years ago
|
||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Comment 19•3 years ago
|
||
Pushed by lsalzman@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b70b2ce3a2d9 Reuse CompositeYUV routines to accelerate blendYUV. r=jrmuizel
Assignee | ||
Comment 20•3 years ago
|
||
Comment on attachment 9204734 [details]
Bug 1674618 - Reuse CompositeYUV routines to accelerate blendYUV. r?jrmuizel
Beta/Release Uplift Approval Request
- User impact if declined: Trying to do partial SW-WR rollout on Linux. Would like to avoid people reporting significant performance regressions months down the line.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Only impacts SW-WR/Linux. We can abort the rollout if unstable. Adequate time to deal with any bugs.
- String changes made/needed:
Comment 21•3 years ago
|
||
bugherder |
Comment 22•3 years ago
|
||
Comment on attachment 9204734 [details]
Bug 1674618 - Reuse CompositeYUV routines to accelerate blendYUV. r?jrmuizel
Approved for 87.0b3.
Comment 23•3 years ago
|
||
bugherder uplift |
Description
•