Closed Bug 1507680 Opened 1 year ago Closed 1 year ago

Record detailed stats on slow frames

Categories

(Core :: Graphics: WebRender, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla65
Tracking Status
firefox65 --- fixed

People

(Reporter: mattwoodrow, Assigned: mattwoodrow)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

We currently record CONTENT_FRAME_TIME in telemetry which lets us know when frames are too slow, but doesn't give us much information about why they took so long.

It would be nice to have a more detailed breakdown of where the time went, so we can know what to work on.

As a first step, I'm going to try collecting stats on the slowest frames, and make them available in about:support.
Attached patch WIPSplinter Review
This is my first attempt, it sends frame stats to the parent process' gfxPlatform and then exposes them in about:support's Diagnostics section.

Example output (from a slow debug build):

Slow Frame #01	Frame 14 CONTENT_FRAME_TIME 2443 - Transaction start 1.442925, full paint time 11.478862, Composite start 400.085697, Resource upload time 0.008714, GPU cache upload time 0.325035, Composite time 8.579667
Slow Frame #02	Frame 1 CONTENT_FRAME_TIME 1600 - Transaction start 111.294706, full paint time 254.359177, Composite start 375.402980, Resource upload time 0.036861, GPU cache upload time 0.340851, Composite time 2.610739
Slow Frame #03	Frame 7 CONTENT_FRAME_TIME 712 - Transaction start 1.652627, full paint time 10.556401, Composite start 116.421181, Resource upload time 0.007068, GPU cache upload time 0.266846, Composite time 3.945050
Slow Frame #04	Frame 8 CONTENT_FRAME_TIME 586 - Transaction start 6.090603, full paint time 7.301039, Composite start 99.886457, Resource upload time 0.007068, GPU cache upload time 0.266846, Composite time 3.945050
Slow Frame #05	Frame 7 CONTENT_FRAME_TIME 424 - Transaction start 23.408652, full paint time 50.470691, Composite start 84.219413, Resource upload time 0.722010, GPU cache upload time 0.184319, Composite time 9.907989
Slow Frame #06	Frame 4 CONTENT_FRAME_TIME 398 - Transaction start 34.840794, full paint time 27.516830, Composite start 67.353343, Resource upload time 2.653419, GPU cache upload time 0.497601, Composite time 33.798601
Slow Frame #07	Frame 19 CONTENT_FRAME_TIME 349 - Transaction start 2.821212, full paint time 31.500785, Composite start 52.259315, Resource upload time 0.007805, GPU cache upload time 0.338629, Composite time 8.770072
Slow Frame #08	Frame 10 CONTENT_FRAME_TIME 317 - Transaction start 11.274926, full paint time 39.487627, Composite start 50.363253, Resource upload time 0.005364, GPU cache upload time 0.266546, Composite time 13.729517
Slow Frame #09	Frame 9 CONTENT_FRAME_TIME 308 - Transaction start 25.998221, full paint time 30.185789, Composite start 65.701486, Resource upload time 0.005298, GPU cache upload time 0.267921, Composite time 11.570285
Slow Frame #10	Frame 21 CONTENT_FRAME_TIME 306 - Transaction start 10.849745, full paint time 24.766373, Composite start 49.909384, Resource upload time 0.119387, GPU cache upload time 0.225979, Composite time 11.889683


The slowest one is very suspicious there, it's not obvious why we did nothing after finishing full-paint (at around 12.9ms after the initial vsync) until the 400ms mark.
Attachment #9025529 - Flags: feedback?(jmuizelaar)
Comment on attachment 9025529 [details] [diff] [review]
WIP

Sotaro, would be good to have your feedback on this too.

The slowest frame here seems like a bug where we might not be triggering the composite after scene building is done.

Could be related to the reftest hang you and tnikkel are looking at?

It's extra weird that it's right on 400ms that we start the composite. If you look at the CONTENT_FRAME_TIME regression relative graphs (command 34 on https://gist.github.com/mattwoodrow/6743c06bf847c8adbe68a1ef95adaf34) there is a spike at 12 and 36, which corresponds to 200ms and 600ms. Not sure if that's related or not, or why 24frames/400ms doesn't show up.
Attachment #9025529 - Flags: feedback?(sotaro.ikeda.g)
Blocks: 1481950
:mattwoodrow, in which platform did you get the logs in  Comment 1? I wonder if shader compiling time is related to the long latency.
Flags: needinfo?(matt.woodrow)
On windows, when there were no shader disk cache, I saw a lot of logs during start up with attachment 9025529 [details] [diff] [review].
Frame stats during start-up were very different between with shader disk cache and without shader disk cache. So, it might be better to check if program binary was loaded from shader disk cache or not on Windows telemetry.

If there is no shader disk cache, Firefox start-up takes longer and Render Thread was blocked around 600ms. And Compositor thread was not scheduled for 400ms - 500ms. It seemed to happen just because CPU was too busy.
  https://perfht.ml/2TneH0S

I manually cleaned shader disk cache by clearing one of "gfx-shader-check.*" in about:config. Then the shader disk cache is deleted in nest start up.
(In reply to Sotaro Ikeda [:sotaro] from comment #5)
> Frame stats during start-up were very different between with shader disk
> cache and without shader disk cache. So, it might be better to check if
> program binary was loaded from shader disk cache or not on Windows telemetry.
> 
> If there is no shader disk cache, Firefox start-up takes longer and Render
> Thread was blocked around 600ms. And Compositor thread was not scheduled for
> 400ms - 500ms. It seemed to happen just because CPU was too busy.
>   https://perfht.ml/2TneH0S

In this case, there was long delay between calling WebRenderBridgeParent::ScheduleGenerateFrame() and calling WebRenderBridgeParent::MaybeGenerateFrame(). But WebRenderBridgeParent::CompositeToTarget() was called regularly.

The MaybeGenerateFrame() was not called because "wr::RenderThread::Get()->TooManyPendingFrames(mApi->GetId())" returned true. On Render thread, RenderThread::UpdateAndRender() took very long time for compiling shader programs.
(In reply to Matt Woodrow (:mattwoodrow) from comment #2)
> Comment on attachment 9025529 [details] [diff] [review]
> WIP
> 
> Sotaro, would be good to have your feedback on this too.
> 
> The slowest frame here seems like a bug where we might not be triggering the
> composite after scene building is done.
> 
> Could be related to the reftest hang you and tnikkel are looking at?

I think it is a different problem. As in Comment 6, if RenderThread::UpdateAndRender() is busy for long time, MaybeGenerateFrame() is not called because of too many pending frames.

> It's extra weird that it's right on 400ms that we start the composite. If
> you look at the CONTENT_FRAME_TIME regression relative graphs (command 34 on
> https://gist.github.com/mattwoodrow/6743c06bf847c8adbe68a1ef95adaf34) there
> is a spike at 12 and 36, which corresponds to 200ms and 600ms. Not sure if
> that's related or not, or why 24frames/400ms doesn't show up.

It happened easily when there was no shader disk cache. When WebRenderBridgeParent::RecvSetDisplayList() received first content frame(not display list for chrome), Render thread was alreay busy for compiling shaders. The MaybeGenerateFrame() was not called because of too many pending frames for around 600ms.
Comment on attachment 9025529 [details] [diff] [review]
WIP

Review of attachment 9025529 [details] [diff] [review]:
-----------------------------------------------------------------

The patch worked well! In addition to it, it seems nice if we could have the following info.

- How many vsync frames were skipped before calling MaybeGenerateFrame()
- Whether program binaries were loaded from shader disk cache.
Attachment #9025529 - Flags: feedback?(sotaro.ikeda.g) → feedback+
(In reply to Sotaro Ikeda [:sotaro] from comment #8)

> 
> - How many vsync frames were skipped before calling MaybeGenerateFrame()

This seems hard, since we have a list of pending frames in WebRenderBridgeParent::mPendingTransactionIds, but some of them might be still scene-building, and we don't know which ones we are skipping when we skip MaybeGenerateFrame.

I'm going to add main-thread time, and URL too.
Flags: needinfo?(matt.woodrow)
Comment on attachment 9025529 [details] [diff] [review]
WIP

Review of attachment 9025529 [details] [diff] [review]:
-----------------------------------------------------------------

It would be nice to have renderer time in here too.

::: gfx/layers/wr/WebRenderBridgeParent.cpp
@@ +1925,5 @@
> +                                               aStats ? (double(aStats->resource_upload_time) / 1000000.0) : 0.0,
> +                                               aStats ? (double(aStats->gpu_cache_upload_time) / 1000000.0) : 0.0,
> +                                               transactionId.mTxnStartTime,
> +                                               transactionId.mRefreshStartTime,
> +                                               transactionId.mSceneBuiltTime));

We probably want to cap this or have it preffed off by default or something.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #10)
> Comment on attachment 9025529 [details] [diff] [review]
> WIP
> 
> Review of attachment 9025529 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> It would be nice to have renderer time in here too.

Ok, added that locally.

> 
> ::: gfx/layers/wr/WebRenderBridgeParent.cpp
> @@ +1925,5 @@
> > +                                               aStats ? (double(aStats->resource_upload_time) / 1000000.0) : 0.0,
> > +                                               aStats ? (double(aStats->gpu_cache_upload_time) / 1000000.0) : 0.0,
> > +                                               transactionId.mTxnStartTime,
> > +                                               transactionId.mRefreshStartTime,
> > +                                               transactionId.mSceneBuiltTime));
> 
> We probably want to cap this or have it preffed off by default or something.

The gfxPlatform side caps the list to the top 10 slowest frames.
Depends on: 1507686
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/52a798ad6583
Record detailed statistics about slow WebRender frames in about:support. r=jrmuizel
https://hg.mozilla.org/mozilla-central/rev/52a798ad6583
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Attachment #9025529 - Flags: feedback?(jmuizelaar) → feedback+
You need to log in before you can comment on or make changes to this bug.