Closed Bug 1623093 Opened 4 years ago Closed 3 years ago

sw-wr: Incredibly slow movements in Twitter's image preview

Categories

(Core :: Graphics: WebRender, defect, P3)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox74 --- unaffected
firefox75 --- unaffected
firefox76 --- disabled
firefox81 --- disabled
firefox82 --- disabled
firefox83 --- disabled
firefox84 --- disabled
firefox85 --- disabled
firefox86 --- verified

People

(Reporter: jan, Unassigned)

References

(Blocks 2 open bugs)

Details

(Keywords: nightly-community, perf, regression)

Attachments

(1 file)

Attached video 2020-03-17_16-01-42.mp4

KDE, X11, Macbook Pro, Intel Iris 6100 (Broadwell GT3), 2560x1600
(The screencast is authentic.)
mozregression --launch 20200317093640 --pref gfx.webrender.all:true gfx.webrender.software:true -a https://twitter.com/UrbanFoxxxx/status/1224002506189086725

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

Because this bug's Severity has not been changed from the default since it was filed, and it's Priority is P3 (Backlog,) indicating it has been triaged, the bug's Severity is being updated to S3 (normal.)

Severity: normal → S3

Twitter image carousel is still very slow with sw-wr in Fx83 on Windows.

Here is a profile:

https://share.firefox.dev/3dj8nCc

of me cycling through the images in my tweet:

https://twitter.com/cpeterso/status/1313712283978096640/photo/1

OS: Linux → All
Hardware: x86_64 → All
See Also: → 1670139

That is indeed incredibly slow.

Here's a breakdown of the 140ms of time:

11ms accum, 11.2ms: draw(cs_clip_rectangle_FAST_PATH, 8): 2989771 pixels in 1898 rows (avg 1575.221812 pixels/row, 3.738079ns/pixel)
11ms accum, 0.0ms: draw(cs_clip_box_shadow, 1): 2209 pixels in 47 rows (avg 47.000000 pixels/row, 13.490267ns/pixel)
11ms accum, 0.0ms: draw(cs_clip_rectangle_FAST_PATH, 2): 4418 pixels in 94 rows (avg 47.000000 pixels/row, 6.428248ns/pixel)
20ms accum, 9.0ms: draw(brush_image, 1): 1354897 pixels in 1277 rows (avg 1061.000000 pixels/row, 6.668994ns/pixel)
26ms accum, 6.2ms: draw(brush_image, 1): 819834 pixels in 1277 rows (avg 642.000000 pixels/row, 7.577997ns/pixel)
27ms accum, 0.9ms: draw(cs_clip_rectangle, 1): 102400 pixels in 40 rows (avg 2560.000000 pixels/row, 8.788086ns/pixel)
30ms accum, 3.0ms: draw(cs_clip_rectangle_FAST_PATH, 80): 758729 pixels in 3280 rows (avg 231.319817 pixels/row, 3.923140ns/pixel)
31ms accum, 0.2ms: draw(cs_clip_box_shadow, 24): 25264 pixels in 232 rows (avg 108.896552 pixels/row, 7.932236ns/pixel)
31ms accum, 0.0ms: draw(cs_clip_image, 1): 2988 pixels in 18 rows (avg 166.000000 pixels/row, 9.370817ns/pixel)
31ms accum, 0.1ms: draw(cs_clip_rectangle_FAST_PATH, 9): 21644 pixels in 235 rows (avg 92.102128 pixels/row, 4.412308ns/pixel)
31ms accum, 0.0ms: draw(cs_clip_rectangle_FAST_PATH, 2): 3042 pixels in 78 rows (avg 39.000000 pixels/row, 4.963840ns/pixel)
31ms accum, 0.5ms: draw(cs_clip_box_shadow, 2): 79120 pixels in 31 rows (avg 2552.258065 pixels/row, 6.889535ns/pixel)
31ms accum, 0.0ms: draw(brush_image_ALPHA_PASS, 7): 2388 pixels in 124 rows (avg 19.258065 pixels/row, 17.671692ns/pixel)
31ms accum, 0.0ms: draw(brush_image_ALPHA_PASS, 10): 1663 pixels in 59 rows (avg 28.186441 pixels/row, 17.378232ns/pixel)
31ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS, 63): 366 pixels in 48 rows (avg 7.625000 pixels/row, 90.437158ns/pixel)
64ms accum, 32.8ms: draw(brush_image_ALPHA_PASS, 1): 2822170 pixels in 1277 rows (avg 2210.000000 pixels/row, 11.619676ns/pixel)
64ms accum, 0.0ms: draw(brush_image_ALPHA_PASS, 1): 2401 pixels in 49 rows (avg 49.000000 pixels/row, 15.743440ns/pixel)
64ms accum, 0.0ms: draw(brush_solid_ALPHA_PASS, 1): 2209 pixels in 47 rows (avg 47.000000 pixels/row, 10.095066ns/pixel)
64ms accum, 0.0ms: draw(brush_image_ALPHA_PASS, 4): 2401 pixels in 98 rows (avg 24.500000 pixels/row, 18.950437ns/pixel)
64ms accum, 0.1ms: draw(ps_text_run_ALPHA_PASS, 37): 3594 pixels in 396 rows (avg 9.075758 pixels/row, 17.306622ns/pixel)
64ms accum, 0.1ms: draw(ps_text_run_ALPHA_PASS, 202): 2528 pixels in 365 rows (avg 6.926027 pixels/row, 29.786392ns/pixel)
64ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS, 17): 2260 pixels in 213 rows (avg 10.610329 pixels/row, 16.061947ns/pixel)
64ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS, 12): 1326 pixels in 204 rows (avg 6.500000 pixels/row, 20.211161ns/pixel)
65ms accum, 0.1ms: draw(ps_text_run_ALPHA_PASS, 40): 4719 pixels in 491 rows (avg 9.610998 pixels/row, 14.918415ns/pixel)
65ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS, 13): 2154 pixels in 216 rows (avg 9.972222 pixels/row, 16.805942ns/pixel)
65ms accum, 0.1ms: draw(ps_text_run_ALPHA_PASS, 33): 4260 pixels in 411 rows (avg 10.364964 pixels/row, 13.943662ns/pixel)
65ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS, 16): 2625 pixels in 257 rows (avg 10.214008 pixels/row, 14.209524ns/pixel)
65ms accum, 0.1ms: draw(ps_text_run_ALPHA_PASS, 59): 5005 pixels in 602 rows (avg 8.313953 pixels/row, 17.222777ns/pixel)
65ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS, 5): 862 pixels in 80 rows (avg 10.775000 pixels/row, 14.849188ns/pixel)
65ms accum, 0.0ms: draw(brush_image_ALPHA_PASS, 6): 3099 pixels in 130 rows (avg 23.838462 pixels/row, 13.778638ns/pixel)
66ms accum, 1.4ms: draw(brush_image, 12): 177440 pixels in 1208 rows (avg 146.887417 pixels/row, 7.649346ns/pixel)
69ms accum, 3.3ms: draw(brush_solid, 37): 18646827 pixels in 13054 rows (avg 1428.437797 pixels/row, 0.174780ns/pixel)
69ms accum, 0.0ms: draw(brush_image_ALPHA_PASS, 1): 2560 pixels in 1 rows (avg 2560.000000 pixels/row, 10.820313ns/pixel)
70ms accum, 0.9ms: draw(brush_image_ALPHA_PASS, 1): 104960 pixels in 41 rows (avg 2560.000000 pixels/row, 8.951982ns/pixel)
70ms accum, 0.0ms: draw(brush_image_ALPHA_PASS, 1): 256 pixels in 16 rows (avg 16.000000 pixels/row, 23.437500ns/pixel)
70ms accum, 0.0ms: draw(brush_solid_ALPHA_PASS, 1): 102400 pixels in 40 rows (avg 2560.000000 pixels/row, 0.205078ns/pixel)
70ms accum, 0.0ms: draw(brush_image_ALPHA_PASS, 6): 2808 pixels in 116 rows (avg 24.206897 pixels/row, 15.313390ns/pixel)
70ms accum, 0.0ms: draw(brush_opacity_ALPHA_PASS, 1): 256 pixels in 16 rows (avg 16.000000 pixels/row, 27.734375ns/pixel)
71ms accum, 0.2ms: draw(brush_solid_ALPHA_PASS, 28): 21490 pixels in 182 rows (avg 118.076923 pixels/row, 8.548162ns/pixel)
71ms accum, 0.1ms: draw(brush_image_ALPHA_PASS, 18): 7134 pixels in 232 rows (avg 30.750000 pixels/row, 13.512756ns/pixel)
71ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 25): 1372 pixels in 209 rows (avg 6.564593 pixels/row, 22.594752ns/pixel)
71ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 5): 362 pixels in 47 rows (avg 7.702128 pixels/row, 21.823204ns/pixel)
71ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 31): 2053 pixels in 291 rows (avg 7.054983 pixels/row, 19.191427ns/pixel)
71ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 2): 180 pixels in 20 rows (avg 9.000000 pixels/row, 26.111111ns/pixel)
71ms accum, 0.1ms: draw(brush_image_ALPHA_PASS, 8): 1752 pixels in 274 rows (avg 6.394161 pixels/row, 57.591324ns/pixel)
71ms accum, 0.1ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 47): 2926 pixels in 399 rows (avg 7.333333 pixels/row, 24.504443ns/pixel)
71ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 2): 360 pixels in 20 rows (avg 18.000000 pixels/row, 25.000000ns/pixel)
71ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 6): 622 pixels in 73 rows (avg 8.520548 pixels/row, 27.009646ns/pixel)
71ms accum, 0.2ms: draw(brush_solid_ALPHA_PASS, 5): 13692 pixels in 99 rows (avg 138.303030 pixels/row, 11.247444ns/pixel)
71ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 6): 474 pixels in 60 rows (avg 7.900000 pixels/row, 34.599156ns/pixel)
71ms accum, 0.1ms: draw(brush_image_ALPHA_PASS, 9): 2210 pixels in 385 rows (avg 5.740260 pixels/row, 61.402715ns/pixel)
72ms accum, 0.2ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 131): 9082 pixels in 1199 rows (avg 7.574646 pixels/row, 21.801365ns/pixel)
72ms accum, 0.0ms: draw(brush_image_ALPHA_PASS, 1): 2401 pixels in 49 rows (avg 49.000000 pixels/row, 15.618492ns/pixel)
72ms accum, 0.1ms: draw(brush_solid_ALPHA_PASS, 5): 3109 pixels in 107 rows (avg 29.056075 pixels/row, 16.371824ns/pixel)
72ms accum, 0.0ms: draw(brush_image_ALPHA_PASS, 4): 2401 pixels in 98 rows (avg 24.500000 pixels/row, 19.575177ns/pixel)
72ms accum, 0.4ms: draw(brush_image_ALPHA_PASS, 22): 5536 pixels in 1042 rows (avg 5.312860 pixels/row, 68.244220ns/pixel)
72ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 6): 424 pixels in 57 rows (avg 7.438596 pixels/row, 29.245283ns/pixel)
72ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 1): 273 pixels in 13 rows (avg 21.000000 pixels/row, 26.739927ns/pixel)
72ms accum, 0.4ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 275): 17876 pixels in 2381 rows (avg 7.507770 pixels/row, 23.153949ns/pixel)
73ms accum, 0.2ms: draw(brush_solid_ALPHA_PASS, 2): 19461 pixels in 111 rows (avg 175.324324 pixels/row, 8.786804ns/pixel)
73ms accum, 0.1ms: draw(brush_image_ALPHA_PASS, 1): 4761 pixels in 69 rows (avg 69.000000 pixels/row, 14.324722ns/pixel)
73ms accum, 0.0ms: draw(brush_image_ALPHA_PASS, 5): 374 pixels in 34 rows (avg 11.000000 pixels/row, 40.909091ns/pixel)
73ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 13): 845 pixels in 120 rows (avg 7.041667 pixels/row, 22.840237ns/pixel)
73ms accum, 0.6ms: draw(brush_image_ALPHA_PASS, 20): 8595 pixels in 3315 rows (avg 2.592760 pixels/row, 65.154159ns/pixel)
73ms accum, 0.0ms: draw(brush_image_ALPHA_PASS, 1): 2401 pixels in 49 rows (avg 49.000000 pixels/row, 15.410246ns/pixel)
74ms accum, 0.5ms: draw(brush_solid_ALPHA_PASS, 6): 83501 pixels in 121 rows (avg 690.090909 pixels/row, 6.488545ns/pixel)
74ms accum, 0.1ms: draw(brush_image_ALPHA_PASS, 14): 7391 pixels in 310 rows (avg 23.841935 pixels/row, 15.803004ns/pixel)
74ms accum, 0.1ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 45): 4710 pixels in 499 rows (avg 9.438878 pixels/row, 16.433121ns/pixel)
74ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 18): 2336 pixels in 291 rows (avg 8.027491 pixels/row, 17.465753ns/pixel)
74ms accum, 0.1ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 46): 5574 pixels in 562 rows (avg 9.918149 pixels/row, 16.002870ns/pixel)
74ms accum, 0.1ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 24): 3713 pixels in 393 rows (avg 9.447837 pixels/row, 17.452195ns/pixel)
74ms accum, 0.2ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 96): 9806 pixels in 1065 rows (avg 9.207512 pixels/row, 16.642872ns/pixel)
74ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 17): 2788 pixels in 272 rows (avg 10.250000 pixels/row, 16.463415ns/pixel)
75ms accum, 0.1ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 66): 3997 pixels in 389 rows (avg 10.275064 pixels/row, 19.064298ns/pixel)
75ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 9): 1753 pixels in 140 rows (avg 12.521429 pixels/row, 15.801483ns/pixel)
75ms accum, 0.0ms: draw(brush_image_ALPHA_PASS, 2): 36 pixels in 1 rows (avg 36.000000 pixels/row, 58.333333ns/pixel)
75ms accum, 0.0ms: draw(brush_image_ALPHA_PASS, 7): 1329 pixels in 82 rows (avg 16.207317 pixels/row, 23.175320ns/pixel)
75ms accum, 0.0ms: draw(ps_text_run_ALPHA_PASS_DUAL_SOURCE_BLENDING, 11): 594 pixels in 78 rows (avg 7.615385 pixels/row, 21.548822ns/pixel)
109ms accum, 34.2ms: draw(brush_solid_ALPHA_PASS, 7): 6177150 pixels in 3885 rows (avg 1590.000000 pixels/row, 5.536874ns/pixel)
138ms accum, 29.5ms: draw(brush_image_ALPHA_PASS, 9): 2822430 pixels in 2568 rows (avg 1099.077103 pixels/row, 10.464316ns/pixel)
138ms accum, 0.1ms: draw(brush_solid_ALPHA_PASS, 6): 5588 pixels in 150 rows (avg 37.253333 pixels/row, 14.047960ns/pixel)
138ms accum, 0.0ms: draw(brush_image_ALPHA_PASS, 3): 843 pixels in 49 rows (avg 17.204082 pixels/row, 23.368921ns/pixel)
139ms accum, 0.6ms: draw(brush_image_ALPHA_PASS, 2): 1551 pixels in 1311 rows (avg 1.183066 pixels/row, 414.764668ns/pixel)
141ms accum, 2.0ms: draw(brush_image_ALPHA_PASS, 5): 165622 pixels in 950 rows (avg 174.338947 pixels/row, 12.145729ns/pixel)
141ms accum, 0.0ms: draw(brush_image_ALPHA_PASS, 7): 1845 pixels in 137 rows (avg 13.467153 pixels/row, 23.306233ns/pixel)
Finish 156.8ms 141.2ms 3525120 pixels

The capture that I'm looking at is at roughly the halfway point of the transition.

The biggest thing that stands out when looking at the renderdoc of the last big brush_image_ALPHA_PASS draw. It isn't until this draw that the two large foreground images actually show up. These images were previously drawn into a temporary that ends up with alpha. If, instead, we could get these images drawn early it would save quite a bit of work:

  • We wouldn't be drawing them twice
  • We could avoid alpha blending them
  • The depth buffer would save a bunch of work for the rest of the page

Other than that, our existing plans to specialize brush_image_ALPHA, brush_image, and brush_solid should help here.

There's a transform-3d that contains the image so I wonder if that's causing the badness.

Blocks: wr-gpu-time
See Also: → 1664478

It looks like a lot of this sadness is caused by 0.01px border radius on an element that contains the carousel. This causes us to need to draw to a temporary. Removing the border radius make performance quite a bit better. (Still not great though)

No longer blocks: 1621454
Depends on: 1621454

It looks like twitter fixed the border radius and with Lee's fast path stuff the frame times go from 45ms for me to down to 20ms:
https://share.firefox.dev/3kE07Q0

I retested SW-WR in today's 85 Nightly and Twitter image animations are much faster than there were a couple months ago, but still worse than hardware-accelerated WR.

Test case:
https://twitter.com/benjedwards/status/1338476543971045377

Want to test again now that bug 1669841 has landed? Also maybe compare to hardware acceleration being disabled?

Flags: needinfo?(cpeterson)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #11)

Want to test again now that bug 1669841 has landed? Also maybe compare to hardware acceleration being disabled?

I just tested Nightly 86 build 2021-01-15 using the tweet from comment 10. I'm comparing WebRender (Software D3D11) vs "Intel(R) UHD Graphics 630" GPU on Windows 10.

What is the expected performance difference for SW-WR compared to hardware WR? SW-WR is still visibly slower when slowly cycling through the tweet's image carousel. I estimate the slide transition takes roughly 50% longer, but the performance is probably good enough for this Twitter case if you'd like to consider this bug fixed.

Profile of hardware SW:
https://share.firefox.dev/3ii2mrF

Profile of SW-WR:
https://share.firefox.dev/3ij7NGN

Flags: needinfo?(cpeterson) → needinfo?(jmuizelaar)

It's expected to be slower than HW-WR. I'm most interested compared to Basic Layers (aka the current no hardware acceleration path)

Flags: needinfo?(jmuizelaar)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #13)

It's expected to be slower than HW-WR. I'm most interested compared to Basic Layers (aka the current no hardware acceleration path)

I just tested Basic Layers (with layers.acceleration.disabled = true) and SW-WR is about the same speed or maybe a little faster! I'll close this bug as fixed by bug 1669841.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: