Closed Bug 1471962 Opened 2 years ago Closed 2 years ago
Flush scene builder when as part of flushing rendering
59 bytes, text/x-review-board-request
We get sync FlushRendering calls during window resize, and the expectation is that when that call finishes the rendering will have updated to the new window size. However currently WebRenderBridgeParent doesn't flush the scene builder, it has a TODO about that. If we don't flush the scene builder we could just end up re-rendering the old scene which isn't at the new window size. This also addresses the tresize regression I was seeing with the latest version of my patches for servo/webrender#2807.
Well. This helps tresize on linux, but hurts tpaint, ts_paint, ts_paint_heavy, ts_paint_webext on windows. By a lot.
Could this be because those tests have expensive content scenes? That would make this an instance of the problem mentioned in bug 1464086 comment 1.
Oh, I see now that you've filed bug 1471562 about that problem.
It's also not clear to me under what circumstances we do a sync FlushRendering call. It surprised me that we're doing that so much in talos tests like tpaint, but we must be if making that slower causes regressions in these tests. I suspect that fixing the extra render described at  will help a bunch in solving the regression.  https://searchfox.org/mozilla-central/rev/97d488a17a848ce3bebbfc83dc916cf20b88451c/gfx/layers/wr/WebRenderBridgeParent.cpp#885
I haven't checked the code but the cases that I'm aware of where we flush rendering are: - window opening - window focus changes - window resizes
Yeah, it looks likes both tpaint and ts_paint spawn new windows, and so the fact that we're getting a rendering flush seems correct. In the case of ts_paint, it records from process startup to the end of the first paint (at ), so that includes a sync rendering flush, but does NOT include the extra render that I mentioned in comment 5. So removing that render wouldn't help. Without the patch on this bug, the test is effectively measuring the wrong thing, because it hasn't included the display list that got sent over, because the flush doesn't actually flush the scene build. So the regression on ts_paint is a result of measuring things more correctly. I did also observe that the time from process startup to the first parent-process MozAfterPaint is reduced with the patch applied, because the MozAfterPaint is only fired when the epochs line up, and that happens sooner with the explicit scene building flush than without. So if the test were recording the time from process startup to first UI process MozAfterPaint it would register an improvement rather than a regression with this patch. tpaint is a little different; it opens a child window which reports back on the first content MozAfterPaint and it records the time taken for that. However, from my investigation, it looks like the thing that is triggering the MozAfterPaint is the timer at  rather than actually painting that content. So that's totally wrong.  https://searchfox.org/mozilla-central/rev/97d488a17a848ce3bebbfc83dc916cf20b88451c/view/nsViewManager.cpp#361  https://searchfox.org/mozilla-central/rev/97d488a17a848ce3bebbfc83dc916cf20b88451c/layout/base/nsPresContext.cpp#3303
Oho! So it looks like with layers, we send a DidComposite for content-process layers even if those layers are not attached to the parent process layers. i.e. say we create a new tab, and we create the associated RenderFrameParent in the UI process. That RenderFrameParent doesn't actually get painted right away; some code (I presume the front-end) keeps it deactivated somehow until it knows that the content process layers have been painted via a MozAfterPaint, and then it activates the RenderFrameParent. With WR, that MozAfterPaint doesn't happen because, well, the display list wasn't rendered and so the PipelineInfo doesn't contain the epoch for the content process pipeline. After 100ms that timer expires, fires the MozAfterPaint, and then that allows the front-end to activate the RenderFrameParent. If I increase the timer from 100ms to 1000ms the exact same thing happens - it takes 1000ms before the MozAfterPaint fires and unblocks the painting. So I think for consistency with layers we need to make it so that even if a content process display list isn't attached to the parent process we need to send a MozAfterPaint for it.
The patches I have for bug 1473352 help somewhat with the tpaint regression, but it's still around 60% on windows (was >90% without those patches, in comment 1). It's still around 60-70% on ts_paint*. Try push with a hacky version of the changes needed for bug 1473352, plus the FlushRendering change: https://treeherder.mozilla.org/#/jobs?repo=try&revision=456477f60f9b35e9e087f37100d4724f64a33b53
I've been trying to improve the talos numbers on these tests so that we can land the patch for this bug without making the regression worse. However I haven't been very successful. Most of the time seems to be spent in binding the shaders for the new window, as that happens on-demand for the window's first composite. I tried to make it happen a bit earlier as part of an "async setup" step that is triggered on the renderer thread after we create the renderer but before the first composite. It didn't really help since it still takes long enough that it pushes the composite back.
Comment on attachment 8992672 [details] Bug 1471962 - Flush scene builds when flushing rendering. https://reviewboard.mozilla.org/r/257528/#review264414
Attachment #8992672 - Flags: review?(jmuizelaar) → review+
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/autoland/rev/83724fe36744 Flush scene builds when flushing rendering. r=jrmuizel
Moving dependencies to the talos regression bug.
You need to log in before you can comment on or make changes to this bug.