Slow Frames detected on singlesprout home page

REOPENED
Assigned to

Status

()

defect
P3
normal
REOPENED
4 months ago
a month ago

People

(Reporter: yoasif, Assigned: kvark)

Tracking

(Blocks 5 bugs, {nightly-community})

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox66 disabled, firefox67 wontfix, firefox68 affected)

Details

()

Attachments

(4 attachments)

Reporter

Description

4 months ago

Seeing the red animation after enabling gfx.webrender.debug.slow-frame-indicator on https://www.singlesprout.com/

Profile: https://perfht.ml/2VPrlHn

Slow Frame #01 Frame 2729(https://www.singlesprout.com/) CONTENT_FRAME_TIME 711 - Transaction start 0.250244, main-thread time 1.360601, full paint time 8.513312, Skipped composites 0, Composite start 16.124971, Resource upload time 0.000434, GPU cache upload time 0.253842, Render time 101.557626, Composite time 102.260539
Slow Frame #02 Frame 2732(https://www.singlesprout.com/) CONTENT_FRAME_TIME 662 - Transaction start 0.199985, main-thread time 1.040204, full paint time 1.750260, Skipped composites 0, Composite start 60.498300, Resource upload time 2.838174, GPU cache upload time 1.248226, Render time 47.848765, Composite time 49.632026
Slow Frame #03 Frame 2730(https://www.singlesprout.com/) CONTENT_FRAME_TIME 645 - Transaction start 0.291483, main-thread time 1.364013, full paint time 2.188527, Skipped composites 0, Composite start 16.129933, Resource upload time 0.001761, GPU cache upload time 0.058995, Render time 5.150866, Composite time 91.181365
Slow Frame #04 Frame 315(https://www.singlesprout.com/) CONTENT_FRAME_TIME 616 - Transaction start 0.336957, main-thread time 2.412339, full paint time 8.948310, Skipped composites 0, Composite start 16.040572, Resource upload time 1.621313, GPU cache upload time 0.114735, Render time 84.312407, Composite time 86.308967
Slow Frame #05 Frame 316(https://www.singlesprout.com/) CONTENT_FRAME_TIME 559 - Transaction start 0.614784, main-thread time 3.238029, full paint time 5.423443, Skipped composites 0, Composite start 12.906398, Resource upload time 0.001895, GPU cache upload time 0.324688, Render time 6.633799, Composite time 76.681015
Slow Frame #06 Frame 2772(https://www.singlesprout.com/) CONTENT_FRAME_TIME 506 - Transaction start 0.243645, main-thread time 1.299267, full paint time 1.991710, Skipped composites 0, Composite start 16.103180, Resource upload time 0.663972, GPU cache upload time 0.268590, Render time 67.415635, Composite time 68.084906
Slow Frame #07 Frame 2731(https://www.singlesprout.com/) CONTENT_FRAME_TIME 471 - Transaction start 0.381023, main-thread time 2.520607, full paint time 4.443342, Skipped composites 0, Composite start 16.042296, Resource upload time 0.000620, GPU cache upload time 0.102412, Render time 1.800384, Composite time 62.339946
Slow Frame #08 Frame 2934(https://www.singlesprout.com/) CONTENT_FRAME_TIME 466 - Transaction start 0.465604, main-thread time 7.433910, full paint time 11.099232, Skipped composites 0, Composite start 15.917207, Resource upload time 2.745110, GPU cache upload time 0.385973, Render time 4.642055, Composite time 61.459521
Slow Frame #09 Frame 2774(https://www.singlesprout.com/) CONTENT_FRAME_TIME 437 - Transaction start 0.962038, main-thread time 2.663309, full paint time 4.660299, Skipped composites 0, Composite start 16.068381, Resource upload time 0.000477, GPU cache upload time 0.060649, Render time 55.797323, Composite time 56.514158
Slow Frame #10 Frame 2773(https://www.singlesprout.com/) CONTENT_FRAME_TIME 430 - Transaction start 0.190110, main-thread time 0.870163, full paint time 1.613620, Skipped composites 0, Composite start 16.065874, Resource upload time 0.002077, GPU cache upload time 0.273269, Render time 3.445983, Composite time 55.526719

I can reproduce this by scrolling down the list of clients, and mousing over the client names (such that they animate their size).

Lots of time spent on the Renderer thread, just blocking waiting for the previous frame to complete.

Enabling picture caching helps a lot, though it seems like it's possible to temporary break it sometimes and get ~1second of slowness, and then it goes back to being fast (stands out in the gecko profiler).

WR profiler shows a lot of time in B_Image (and also target init).

Any ideas Glenn?

Flags: needinfo?(gwatson)

Comment 2

4 months ago

I tried with the latest picture caching pages - it's still very janky when scrolling. There seems to be a very large number of color targets (7+) that get allocated and used, and a lot of mix-blend-mode, which is currently quite slow. This will need further investigation, both to see why it's invalidating with picture caching, and why it's so slow to rasterize.

Flags: needinfo?(gwatson)
Priority: -- → P3

Updated

4 months ago
Assignee: nobody → gwatson

Comment 3

4 months ago

Notes for myself:

  • The mix-blend code that isolates stacking contexts doesn't take into account iframes / content rects, so is doing an extra blit and confusing the invalidation logic.

  • There is a world clip on one of the background images that is tripping up the invalidation logic - it appears to be 1px larger in the y-axis than I would expect. Might be a Gecko bug, might be something that can be handled in WR.

  • The mix-blend-mode is slow here because there are two large, chained mix-blend-modes. We might need to consider implementing the options previously discussed to make mix-blend-mode faster than the current implementation.

  • Although there are 7 reported color targets, at least 2 of them are only storage for the mix-blend-mode readback regions. However, we still clear these targets on the GPU, which is redundant and costs ~2-3 ms GPU time.

Comment 4

4 months ago

There are a number of invalidation issues here, which will need separate patches.

The first patch is https://bugzilla.mozilla.org/show_bug.cgi?id=1521689.

Comment 5

4 months ago

A second patch is up - this fixes interning text runs on this page.

https://bugzilla.mozilla.org/show_bug.cgi?id=1522028

There are still pending patches to resolve all the invalidations on this page.

Comment 6

4 months ago

WIP patch for Matt to take a look at.

No longer blocks: 1524117
Depends on: 1524117

Comment 7

4 months ago

Matt, do you want to take this bug? Once your mix-blend-mode patch lands, we'll probably want to either close this or unassign and drop priority for mvp?

Flags: needinfo?(matt.woodrow)

Comment 8

4 months ago

This is not great on integrated GPUs and still needs some work, but it seems to run well on discrete GPUs that I've tested.

Jeff, could you test / verify if it's OK to drop the priority on this one?

Flags: needinfo?(matt.woodrow) → needinfo?(jmuizelaar)

Bug 1524117 never landed. Presumably it helps with perf on this. Do you want to fix the wrench test and reland that and then we can drop the priority of this one?

Flags: needinfo?(jmuizelaar)
Priority: P3 → P4

Now that bug 1524117 has landed this seems pretty good.

Updated

3 months ago
Assignee: gwatson → nobody
Assignee

Comment 11

3 months ago

Fixing our mix-blend performance is fairly important.

Assignee: nobody → dmalyshau
Status: NEW → ASSIGNED
Assignee

Comment 12

3 months ago

This is a new implementation of mix-blend compositing that is meant to be more idiomatic to WR and efficient.

Previously, mix-blend mode was composed in the following way:

  1. parent stacking context was forced to isolate
  2. source picture is also isolated
  3. when rendering the isolated context, the framebuffer is read upon reaching the source. Both the readback and the source are placed in the RT cache.
  4. a mix-blend draw call is issued to read from those cache segments and blend on top of the backdrop

The new implementation works by using the picture cutting (intruduced for preserve-3D contexts earlier) and some bits of magic:

  1. backdrop stacking context is isolated with a special composition mode that prevents it from actually rendeing unless the suorce stacking context is invisible.
  2. source stacking context is isolated with mix-blend composition mode that has a pointer to the backdrop picture
  3. the instance of the backdrop picture is placed as a peer of the source picture (not a child)
  4. if the backdrop is invisible, the source is drawn as a simple blit
  5. otherwise, it's a draw call that reads from the isolated backdrop and source textures

Note the differences:

  • parent stacking context is not isolated, but backdrop is
  • no framebuffer readback is involved
  • the source and backdrop pictures are rendered in parallel in a pass, improving the batching
  • we don't blend onto the backdrop while reading from the backdrop copy at the same time
  • the depth of the render pass tree is reduced: previously the parent and the source were isolated, now the source and the backdrop, which are siblings
Assignee

Comment 13

3 months ago

Checked a GPU capture of the new implementation on Singesprout.com versus the old one: slightly reduced draw call count (36 versus 48), missing 3 framebuffer blits, otherwise seems identical. Therefore, I don't expect a major perf win here, but doing this re-implementation has a value anyway: it fits the picture/surface/task structure much better.

Assignee

Comment 18

3 months ago

Figured out the issues on Windows and updated before landing. The new try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=14d3ae3115c7614a3711ed1529f9ab559436c4cc
It shows a few unexpected passes:

  • blend-gradient-background-color.html
  • background-blending-gradient-color.html
Flags: needinfo?(dmalyshau)

Comment 19

3 months ago
Pushed by dmalyshau@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4f11ca736067
Remove accidental wrench log r=kats

Comment 20

3 months ago
bugherder
Status: ASSIGNED → RESOLVED
Last Resolved: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67

Updated

3 months ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee

Comment 22

3 months ago

Razvan, these are unexpected passes. I suppose all I need is to mark them as passing with this PR?

Flags: needinfo?(dmalyshau) → needinfo?(rmaries)
Assignee

Comment 24

3 months ago
Flags: needinfo?(rmaries)

Comment 25

3 months ago
bugherder
Status: REOPENED → RESOLVED
Last Resolved: 3 months ago3 months ago
Resolution: --- → FIXED

Updated

3 months ago
Depends on: 1530333

Comment 26

3 months ago
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/640e731ed52e
Follow-up to bump the fuzz number slightly, so the test passes on AppVeyor too. rs=kvark
Duplicate of this bug: 1526347

Comment 29

3 months ago
Backout by rgurzau@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6878f6294d68
Backed out 2 changesets for performance regression (bug 1530611) on a CLOSED TREE.

Comment 30

3 months ago
Backout by rgurzau@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c1bd622a4093
Backed out changeset 6878f6294d68 for bustages on a CLOSED TREE.

Comment 32

2 months ago
Pushed by dmalyshau@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/84bc38dc1fde
Revert the mix-blend rewrite r=jrmuizel
Assignee

Updated

2 months ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Comment 33

2 months ago
bugherder
Status: REOPENED → RESOLVED
Last Resolved: 3 months ago2 months ago
Resolution: --- → FIXED
Assignee

Comment 34

2 months ago

The original change was reverted.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla67 → ---
Blocks: wr-68
Priority: P4 → P3
You need to log in before you can comment on or make changes to this bug.