1481950 - Find out why WebRender's CONTENT_FRAME_TIMEs are what they are

Reporter

Description

•

6 years ago

The frame time graph in https://docs.google.com/document/d/1Gh5de2b9YB4FMwDLYoAU9xnEBEkHzcaF3WUoJQs0lOM/edit# is surprising. WebRender has some better scores at the low end but is generally worse. We should make sure it all makes sense.

Sotaro Ikeda [:sotaro]

Comment 1

•

6 years ago

One cause of the difference seems to be EmptyTransaction handling. ShadowLayerForwarder::EndTransaction() does not forward transaction when transaction is empty.
  https://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/ShadowLayers.cpp#643

But WebRenderLayerManager::EndEmptyTransaction() almost always forwards transactions.
  https://dxr.mozilla.org/mozilla-central/source/gfx/layers/wr/WebRenderLayerManager.cpp#180

And WebRenderBridgeParent::RecvEmptyTransaction() returns did composite by calling cbp->NotifyPipelineRendered() if the transaction does not have updates.
  https://dxr.mozilla.org/mozilla-central/source/gfx/layers/wr/WebRenderBridgeParent.cpp#911

Sotaro Ikeda [:sotaro]

Comment 2

•

6 years ago

Bug 1482032 is created for comment 1.

Sotaro Ikeda [:sotaro]

Updated

•

6 years ago

Depends on: 1482032

Matt Woodrow (:mattwoodrow)

Comment 3

•

6 years ago

Thanks Sotaro!

It might also make sense to only register a value for CONTENT_FRAME_TIME if we actually drew something. Both backends seem to have fallbacks path where we call DidComposite/NotifyPipelineRendered even when no work was done.

I think we need some code to run to ensure MozAfterPaint is delivered (since JS can be waiting for it), but we should be able to skip recording the telemetry value.

Matt Woodrow (:mattwoodrow)

Comment 4

•

6 years ago

I reproduced this earlier today (by setting a breakpoint when CONTENT_FRAME_TIME was registered with a value less than 80), and it came from the real render callback (NotifyDidRender).

That might be a sign of another bug, or might have just been a weird one off.

Matt Woodrow (:mattwoodrow)

Comment 5

•

6 years ago

Having profiler markers (with the epoch/transaction id included) for all the events here would make it easier to visualize.

Sotaro Ikeda [:sotaro]

Comment 6

•

6 years ago

There is also additional difference in WebRenderBridgeParent::RecvEmptyTransaction(). !ScrollUpdatesMap.empty() trigger composition, but it does not generate next webrender epoch. It seems better to generate next webrender epoch in this case.

https://dxr.mozilla.org/mozilla-central/source/gfx/layers/wr/WebRenderBridgeParent.cpp#870

Sotaro Ikeda [:sotaro]

Comment 7

•

6 years ago

(In reply to Sotaro Ikeda [:sotaro] from comment #6)
> There is also additional difference in
> WebRenderBridgeParent::RecvEmptyTransaction(). !ScrollUpdatesMap.empty()
> trigger composition, but it does not generate next webrender epoch. It seems
> better to generate next webrender epoch in this case.
> 
> https://dxr.mozilla.org/mozilla-central/source/gfx/layers/wr/
> WebRenderBridgeParent.cpp#870

Created Bug 1482050.

Sotaro Ikeda [:sotaro]

Updated

•

6 years ago

Depends on: 1482050

Sotaro Ikeda [:sotaro]

Comment 8

•

6 years ago

Comment 4 might be related to Comment 6.

Matt Woodrow (:mattwoodrow)

Comment 9

•

6 years ago

(In reply to Sotaro Ikeda [:sotaro] from comment #8)
> Comment 4 might be related to Comment 6.

That does sound possible!

If the call to NotifyDidRender was waiting in the event queue when we run RecvEmptyTransaction (and we hit the path that you mentioned), then it would add to mPendingTransactionIds and resolve it immediately afterwards.

Comment hidden (obsolete)

Sotaro Ikeda [:sotaro]

Updated

•

6 years ago

Assignee: sotaro.ikeda.g → nobody

Sotaro Ikeda [:sotaro]

Updated

•

6 years ago

Attachment #8999499 - Attachment is obsolete: true

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 11

•

6 years ago

I added profiler markers:

Wr-off: https://perfht.ml/2w4uRBy
Wr-on: https://perfht.ml/2w304Fp

From this it looks like there's a consistent disparity between these measurements even in scenarios where we can consistently paint at 60fps.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 12

•

6 years ago

Attached patch Add profiler markers for CONTENT_FRAME_TIME — Details — Splinter Review

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 13

•

6 years ago

Looking at this more closely the differences seem explainable. However the metric doesn't necessarily reflect being better.

Moving time from the content thread to compositor thread will make CONTENT_FRAME_TIME worse but doesn't necessarily reflect a worse user experience.

E.g. imagine you have 5ms content time and 1ms composite time. If this moves to 3ms content time and 3ms composite time the CONTENT_FRAME_TIME will change from ~17ms to 19ms, but the user experience is the same.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 14

•

6 years ago

It seems like it would still be useful to compare CONTENT_FRAME_TIME when values are rounded to full frame multiples. I'm going to look at doing that.

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

6 years ago

Depends on: 1483099

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

6 years ago

Depends on: 1484305

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 15

•

6 years ago

Another reason CONTENT_FRAME_TIMEs can be unexpectedly long is if a content paint gets stuck waiting behind another slow paint.

Here's an example:

- Content process 5 submits paint at 2.693s
- RenderBackend takes a long time (because of font loading)
- Content process 3 submits paint at 2.698s
- Paint from content process 5 finishes 41ms later
- We wait for the next vsync
- Work from content process 3 starts and finishes 5.5ms later
- Paint from content process 3 shows up as taking 50ms even though it only actually took 8+6 = 14ms

Here's a profile of this happening: https://perfht.ml/2Mp2itJ

This is much less likely to happen without WebRender because the expensive stuff tends to happen in the content process and so we're less likely to end up waiting for a previous composite to finish.

Matt Woodrow (:mattwoodrow)

Comment 16

•

6 years ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #15)
> Another reason CONTENT_FRAME_TIMEs can be unexpectedly long is if a content
> paint gets stuck waiting behind another slow paint.
> 
> Here's an example:
> 
> - Content process 5 submits paint at 2.693s
> - RenderBackend takes a long time (because of font loading)
> - Content process 3 submits paint at 2.698s
> - Paint from content process 5 finishes 41ms later
> - We wait for the next vsync
> - Work from content process 3 starts and finishes 5.5ms later
> - Paint from content process 3 shows up as taking 50ms even though it only
> actually took 8+6 = 14ms
> 
> Here's a profile of this happening: https://perfht.ml/2Mp2itJ
> 
> This is much less likely to happen without WebRender because the expensive
> stuff tends to happen in the content process and so we're less likely to end
> up waiting for a previous composite to finish.

That's somewhat realistic though isn't it, since the content process 3 pixels do take longer to get to the screen when this happens?

Matt Woodrow (:mattwoodrow)

Comment 17

•

6 years ago

If only one of those two content processes are actually presenting to the screen, should we skip recording the score for the other one? Since it's not visually relevant?

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 18

•

6 years ago

It's possible both paints are actually ending up in the screen in which case the metric is not actually wrong. It is taking 50ms from start to end, it's just a bit surprising the short paint gets blamed.

I'm thinking about putting together a metric that accumulates the time where we're actually doing work for a paint. This won't be as 'real' of a measure but it will be simpler to reason about.

Matt Woodrow (:mattwoodrow)

Comment 19

•

6 years ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #18)
> It's possible both paints are actually ending up in the screen in which case
> the metric is not actually wrong. It is taking 50ms from start to end, it's
> just a bit surprising the short paint gets blamed.
> 
> I'm thinking about putting together a metric that accumulates the time where
> we're actually doing work for a paint. This won't be as 'real' of a measure
> but it will be simpler to reason about.

Right, makes sense. It's nice to know how long it took to get to the screen (even if it wasn't at fault), but it's nice to know the other version too.

Are you planning on trying to record time spent on 'background' threads (like scene building, OMTP etc)? Are you going to sum time spent on all sub-threads or just the total wall time for a segment of work? Lots of options, I'm not sure what makes the most sense for comparison.

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

6 years ago

Depends on: 1484366

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

6 years ago

Depends on: 1486647

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

6 years ago

Priority: -- → P1

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

6 years ago

Blocks: stage-wr-trains

Maire Reavy [:mreavy]

Comment 20

•

6 years ago

We can't release WR to the field without answering this, but we can let R ride to beta.  Marking as [needs-investigation] so we look at this sooner.

Priority: P1 → P2

Whiteboard: [needs-investigation]

Maire Reavy [:mreavy]

Updated

•

6 years ago

Priority: P2 → P3

Maire Reavy [:mreavy]

Updated

•

6 years ago

Priority: P3 → P2

Matt Woodrow (:mattwoodrow)

Updated

•

6 years ago

Depends on: 1507680, 1507686, 1505858, 1503405

Matt Woodrow (:mattwoodrow)

Updated

•

6 years ago

Depends on: 1503730

Matt Woodrow (:mattwoodrow)

Updated

•

6 years ago

Assignee: nobody → matt.woodrow

Matt Woodrow (:mattwoodrow)

Comment 21

•

6 years ago

I did a new telemetry analysis using the new metrics we've added recently: https://gist.github.com/mattwoodrow/a96517f65f3c813a1deaccc84411956b

Our current understanding is:

* Frames containing SVG are usually much slower, but even with them excluded, CONTENT_FRAME_TIME is significantly regressed.

* Upload times have negligible impact on the CONTENT_FRAME_TIME.

* CONTENT_FULL_PAINT_TIME (total time spent in the 'content' phase = main thread + scene building) is significantly worse.

In order to not drop frames we need to do all other browser work (JS, style, layout), plus the CONTENT_FULL_PAINT_TIME work within one vsync interval (usually 16.7ms @ 60hz).

Allowing an estimated 4.7ms for the other work, and setting 12ms as the threshold for 'too slow' CONTENT_FULL_PAINT_TIME we see that WR misses this 11.2% of the time, whereas non-WR only misses it around 3%.

That seems like a significant difference, and appears to be the biggest contributor to slow CONTENT_FRAME_TIME measurements.

Ongoing work is:

Bug 1508502 - Try recording a version of CONTENT_FULL_PAINT_TIME that includes the JS/style/layout work, so that we don't have to estimate the 4.7ms value and can more accurately know how much of the CONTENT_FRAME_TIME regression comes from here.

Bug 1507680 - Add a detailed breakdown of the slowest frames encountered so that users can provide much more actionable reports when they encounter slow behaviour. We have the option of extending this into telemetry too. Once this bug lands, I will send out a dev-platform email asking for users to submit their results.

Bug 1508188 - Meta bug for tracking performance issues that are known to affect CONTENT_FULL_PAINT_TIME.

Bug 1508116 - Known slowness in scene building due to building massive vectors. Dan is investigating performance improvements.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 22

•

5 years ago

An update on the progress here. CONTENT_FULL_PAINT_TIME is now generally better with WebRender than without. CONTENT_FRAME_TIME is still significantly better but we've added a new metric called CONTENT_FRAME_TIME_VSYNC. This starts measuring earlier which lets us better reason about it's times relative to the refresh rate. WebRender is only a little bit worse ~7% at this new metric. In bug 1517980 I've changed CONTENT_FRAME_TIME_REASON to use the new metric and that should help us get a better sense how to improve further.

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

5 years ago

Priority: P2 → P3

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 23

•

5 years ago

We're good enough not to block on this anymore.

Priority: P3 → P4

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 24

•

5 years ago

We care about CONTENT_FRAME_TIME_VSYNC which is a more representative metric which we are currently basically at parity with.

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

5 years ago

Blocks: wr-67
No longer blocks: stage-wr-trains

Priority: P4 → P3

Nicolas Silva [:nical]

Updated

•

5 years ago

Blocks: wr-68
No longer blocks: wr-67

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

5 years ago

Blocks: wr-70
No longer blocks: wr-68

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

5 years ago

No longer blocks: wr-70

BugBot [:suhaib / :marco/ :calixte]

Comment 25

•

2 years ago

The bug assignee didn't login in Bugzilla in the last 7 months, so the assignee is being reset.

Assignee: matt.woodrow → nobody

Glenn Watson [:gw]

Comment 26

•

2 years ago

Probably no longer relevant.

Status: NEW → RESOLVED

Closed: 2 years ago

Resolution: --- → WONTFIX

patch part 1 - A bit clean up of FlushTransactionIdsForEpoch() 6 years ago Sotaro Ikeda [:sotaro] 2.19 KB, patch		Details \| Diff \| Splinter Review
Add profiler markers for CONTENT_FRAME_TIME 6 years ago Jeff Muizelaar [:jrmuizel] 3.30 KB, patch		Details \| Diff \| Splinter Review