Closed Bug 1481950 Opened 6 years ago Closed 2 years ago

Find out why WebRender's CONTENT_FRAME_TIMEs are what they are

Categories

(Core :: Graphics: WebRender, enhancement, P3)

63 Branch
enhancement

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox63 --- affected

People

(Reporter: jrmuizel, Unassigned)

References

(Depends on 2 open bugs)

Details

(Whiteboard: [needs-investigation])

Attachments

(1 file, 1 obsolete file)

The frame time graph in https://docs.google.com/document/d/1Gh5de2b9YB4FMwDLYoAU9xnEBEkHzcaF3WUoJQs0lOM/edit# is surprising. WebRender has some better scores at the low end but is generally worse. We should make sure it all makes sense.
One cause of the difference seems to be EmptyTransaction handling. ShadowLayerForwarder::EndTransaction() does not forward transaction when transaction is empty.
  https://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/ShadowLayers.cpp#643

But WebRenderLayerManager::EndEmptyTransaction() almost always forwards transactions.
  https://dxr.mozilla.org/mozilla-central/source/gfx/layers/wr/WebRenderLayerManager.cpp#180

And WebRenderBridgeParent::RecvEmptyTransaction() returns did composite by calling cbp->NotifyPipelineRendered() if the transaction does not have updates.
  https://dxr.mozilla.org/mozilla-central/source/gfx/layers/wr/WebRenderBridgeParent.cpp#911
Bug 1482032 is created for comment 1.
Depends on: 1482032
Thanks Sotaro!

It might also make sense to only register a value for CONTENT_FRAME_TIME if we actually drew something. Both backends seem to have fallbacks path where we call DidComposite/NotifyPipelineRendered even when no work was done.

I think we need some code to run to ensure MozAfterPaint is delivered (since JS can be waiting for it), but we should be able to skip recording the telemetry value.
I reproduced this earlier today (by setting a breakpoint when CONTENT_FRAME_TIME was registered with a value less than 80), and it came from the real render callback (NotifyDidRender).

That might be a sign of another bug, or might have just been a weird one off.
Having profiler markers (with the epoch/transaction id included) for all the events here would make it easier to visualize.
There is also additional difference in WebRenderBridgeParent::RecvEmptyTransaction(). !ScrollUpdatesMap.empty() trigger composition, but it does not generate next webrender epoch. It seems better to generate next webrender epoch in this case.

https://dxr.mozilla.org/mozilla-central/source/gfx/layers/wr/WebRenderBridgeParent.cpp#870
(In reply to Sotaro Ikeda [:sotaro] from comment #6)
> There is also additional difference in
> WebRenderBridgeParent::RecvEmptyTransaction(). !ScrollUpdatesMap.empty()
> trigger composition, but it does not generate next webrender epoch. It seems
> better to generate next webrender epoch in this case.
> 
> https://dxr.mozilla.org/mozilla-central/source/gfx/layers/wr/
> WebRenderBridgeParent.cpp#870

Created Bug 1482050.
Depends on: 1482050
Comment 4 might be related to Comment 6.
(In reply to Sotaro Ikeda [:sotaro] from comment #8)
> Comment 4 might be related to Comment 6.

That does sound possible!

If the call to NotifyDidRender was waiting in the event queue when we run RecvEmptyTransaction (and we hit the path that you mentioned), then it would add to mPendingTransactionIds and resolve it immediately afterwards.
Assignee: sotaro.ikeda.g → nobody
Attachment #8999499 - Attachment is obsolete: true
I added profiler markers:

Wr-off: https://perfht.ml/2w4uRBy
Wr-on: https://perfht.ml/2w304Fp

From this it looks like there's a consistent disparity between these measurements even in scenarios where we can consistently paint at 60fps.
Looking at this more closely the differences seem explainable. However the metric doesn't necessarily reflect being better.

Moving time from the content thread to compositor thread will make CONTENT_FRAME_TIME worse but doesn't necessarily reflect a worse user experience.

E.g. imagine you have 5ms content time and 1ms composite time. If this moves to 3ms content time and 3ms composite time the CONTENT_FRAME_TIME will change from ~17ms to 19ms, but the user experience is the same.
It seems like it would still be useful to compare CONTENT_FRAME_TIME when values are rounded to full frame multiples. I'm going to look at doing that.
Depends on: 1483099
Depends on: 1484305
Another reason CONTENT_FRAME_TIMEs can be unexpectedly long is if a content paint gets stuck waiting behind another slow paint.

Here's an example:

- Content process 5 submits paint at 2.693s
- RenderBackend takes a long time (because of font loading)
- Content process 3 submits paint at 2.698s
- Paint from content process 5 finishes 41ms later
- We wait for the next vsync
- Work from content process 3 starts and finishes 5.5ms later
- Paint from content process 3 shows up as taking 50ms even though it only actually took 8+6 = 14ms

Here's a profile of this happening: https://perfht.ml/2Mp2itJ

This is much less likely to happen without WebRender because the expensive stuff tends to happen in the content process and so we're less likely to end up waiting for a previous composite to finish.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #15)
> Another reason CONTENT_FRAME_TIMEs can be unexpectedly long is if a content
> paint gets stuck waiting behind another slow paint.
> 
> Here's an example:
> 
> - Content process 5 submits paint at 2.693s
> - RenderBackend takes a long time (because of font loading)
> - Content process 3 submits paint at 2.698s
> - Paint from content process 5 finishes 41ms later
> - We wait for the next vsync
> - Work from content process 3 starts and finishes 5.5ms later
> - Paint from content process 3 shows up as taking 50ms even though it only
> actually took 8+6 = 14ms
> 
> Here's a profile of this happening: https://perfht.ml/2Mp2itJ
> 
> This is much less likely to happen without WebRender because the expensive
> stuff tends to happen in the content process and so we're less likely to end
> up waiting for a previous composite to finish.

That's somewhat realistic though isn't it, since the content process 3 pixels do take longer to get to the screen when this happens?
If only one of those two content processes are actually presenting to the screen, should we skip recording the score for the other one? Since it's not visually relevant?
It's possible both paints are actually ending up in the screen in which case the metric is not actually wrong. It is taking 50ms from start to end, it's just a bit surprising the short paint gets blamed.

I'm thinking about putting together a metric that accumulates the time where we're actually doing work for a paint. This won't be as 'real' of a measure but it will be simpler to reason about.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #18)
> It's possible both paints are actually ending up in the screen in which case
> the metric is not actually wrong. It is taking 50ms from start to end, it's
> just a bit surprising the short paint gets blamed.
> 
> I'm thinking about putting together a metric that accumulates the time where
> we're actually doing work for a paint. This won't be as 'real' of a measure
> but it will be simpler to reason about.

Right, makes sense. It's nice to know how long it took to get to the screen (even if it wasn't at fault), but it's nice to know the other version too.

Are you planning on trying to record time spent on 'background' threads (like scene building, OMTP etc)? Are you going to sum time spent on all sub-threads or just the total wall time for a segment of work? Lots of options, I'm not sure what makes the most sense for comparison.
Depends on: 1484366
Depends on: 1486647
Priority: -- → P1
We can't release WR to the field without answering this, but we can let R ride to beta.  Marking as [needs-investigation] so we look at this sooner.
Priority: P1 → P2
Whiteboard: [needs-investigation]
Priority: P2 → P3
Priority: P3 → P2
Depends on: 1503730
Assignee: nobody → matt.woodrow
I did a new telemetry analysis using the new metrics we've added recently: https://gist.github.com/mattwoodrow/a96517f65f3c813a1deaccc84411956b

Our current understanding is:

* Frames containing SVG are usually much slower, but even with them excluded, CONTENT_FRAME_TIME is significantly regressed.

* Upload times have negligible impact on the CONTENT_FRAME_TIME.

* CONTENT_FULL_PAINT_TIME (total time spent in the 'content' phase = main thread + scene building) is significantly worse.


In order to not drop frames we need to do all other browser work (JS, style, layout), plus the CONTENT_FULL_PAINT_TIME work within one vsync interval (usually 16.7ms @ 60hz).

Allowing an estimated 4.7ms for the other work, and setting 12ms as the threshold for 'too slow' CONTENT_FULL_PAINT_TIME we see that WR misses this 11.2% of the time, whereas non-WR only misses it around 3%.

That seems like a significant difference, and appears to be the biggest contributor to slow CONTENT_FRAME_TIME measurements.

Ongoing work is:

Bug 1508502 - Try recording a version of CONTENT_FULL_PAINT_TIME that includes the JS/style/layout work, so that we don't have to estimate the 4.7ms value and can more accurately know how much of the CONTENT_FRAME_TIME regression comes from here.

Bug 1507680 - Add a detailed breakdown of the slowest frames encountered so that users can provide much more actionable reports when they encounter slow behaviour. We have the option of extending this into telemetry too. Once this bug lands, I will send out a dev-platform email asking for users to submit their results.

Bug 1508188 - Meta bug for tracking performance issues that are known to affect CONTENT_FULL_PAINT_TIME.

Bug 1508116 - Known slowness in scene building due to building massive vectors. Dan is investigating performance improvements.
An update on the progress here. CONTENT_FULL_PAINT_TIME is now generally better with WebRender than without. CONTENT_FRAME_TIME is still significantly better but we've added a new metric called CONTENT_FRAME_TIME_VSYNC. This starts measuring earlier which lets us better reason about it's times relative to the refresh rate. WebRender is only a little bit worse ~7% at this new metric. In bug 1517980 I've changed CONTENT_FRAME_TIME_REASON to use the new metric and that should help us get a better sense how to improve further.
Priority: P2 → P3

We're good enough not to block on this anymore.

Priority: P3 → P4

We care about CONTENT_FRAME_TIME_VSYNC which is a more representative metric which we are currently basically at parity with.

Blocks: wr-67
No longer blocks: stage-wr-trains
Priority: P4 → P3
Blocks: wr-68
No longer blocks: wr-67
Blocks: wr-70
No longer blocks: wr-68
No longer blocks: wr-70

The bug assignee didn't login in Bugzilla in the last 7 months, so the assignee is being reset.

Assignee: matt.woodrow → nobody

Probably no longer relevant.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: