Sometimes we skip frames during autoscroll without going over our frame budget
Categories
(Core :: Graphics: WebRender, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox77 | --- | fixed |
People
(Reporter: jrmuizel, Assigned: kats)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
(Whiteboard: wr-planning)
Attachments
(7 files, 1 obsolete file)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review |
During HD 4600 investigations I've seen the situation where we skip a frame because we seem to early out from frame building.
Here's what we know:
- This seems to always happen two vsyncs after we do a scene build.
- The scene build happens close to finish around the time of a vsync.
- In the frame after the vsync we get two calls to update_document.
- the first has render_frame=false and would hit an else branch here: https://searchfox.org/mozilla-central/rev/567b68b8ff4b6d607ba34a6f1926873d21a7b4d7/gfx/wr/webrender/src/render_backend.rs#1682 (build_frame and requested_frame are false)
- the second frame is render_frame=true and has build_frame = true as usual
- In the next frame we get a call to update_document(render_frame=true) but this time build_frame is false but requested_frame is true and this is the frame where we fail to render anything causing the frame skip. We don't set build_frame because frame_is_valid=true
Reporter | ||
Comment 1•4 years ago
|
||
Here's a log of scroll messages:
DebugMessage: set_scroll_origin (0,-4706.684) (0,-4721.2964)
DebugMessage: set_scroll_origin (0,-4721.2964) (0,-4735.9087)
DebugMessage: set_scroll_origin (0,-4735.9087) (0,-4750.521)
DebugMessage: set_scroll_origin (0,-4622.521) (0,-4765.1333)
DebugMessage: set_scroll_origin (0,-4765.1333) (0,-4779.7456)
DebugMessage: set_scroll_origin nochange (0,-4779.7456)
DebugMessage: set_scroll_origin (0,-4779.7456) (0,-4794.358)
You can see we double up in one frame and then skip in the next frame.
Updated•4 years ago
|
Assignee | ||
Comment 2•4 years ago
•
|
||
Based on this I expect the following to be happening:
- Scene build
- Vsync, which advances the LastComposeTime in the vsync dispatcher
WebRenderBridgeParent::MaybeGenerateFrame
runs which advances the APZ "current" sample timeupdate_document
runs which samples from APZ here. Because the APZ frame delay is enabled, this actually returns values corresponding to the previous sample time (i.e. the vsync preceding the scene build). Nonetheless, I would expect this to provide updated scroll offsets to WR which should cause an actual frame build instead of bailing out withframe_is_valid=true
. So why that's happening is an open question. But this will also advance the APZ animations to the "current" sample time, which is the LastComposeTime from step 2.update_document
runs again, samples from APZ again. This "current" sample time is the same as before, but the APZ animations were advanced at the end of step 4, so this will return different scroll offsets from what was produced in step 4. And it will not advance APZ animations any further, because the new sample time hasn't changed.- Vsync, which advances the LastComposeTime in the vsync dispatcher
WebRenderBridgeParent::MaybeGenerateFrame
runs which again advances the APZ "current" sample time to the LastComposeTime in step 6.- Another
update_document
happens. Since step 5 didn't advance animations, the scroll offsets here will be the same as in step 5. And APZ animations will get advanced to the timestamp from step 6/7. Since this step doesn't produce new scroll offsets, it makes sense that WR will not repaint anything.
I thought that if, in step 5, we prevent APZ from producing any scroll offsets at all (via the patch I sent :jrmuizel) then it would make step 8 render properly. But I guess we're still missing something there. Would be worth trying to confirm all this by logging the sample time passed to APZCTreeManager::SampleForWebRender and correlating it with the data posted previously. Would also be worth trying to answer the open question in step 4.
Assignee | ||
Comment 3•4 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #2)
But I guess we're still missing something there.
We're actually not missing anything, I misunderstood Jeff's comment. The patch does work. The answer to the "open question" in step 4 is that frame_is_valid
does become false, but since render_frame
is also false, it produces a false build_frame
value here. And that in turn means that frame_is_valid
remains false until the second call to update_document
, at which point the frame build will actually happen.
The patch I wrote prevents APZ from emitting modified scroll offsets in step 5, and since frame_is_valid
is still false from the previous update_document
in step 4, we still go ahead and build a new frame in step 5 even without more data from APZ. So I think this produces all the desired results and the patch can be landed with some cleanup.
Updated•4 years ago
|
Assignee | ||
Comment 4•4 years ago
|
||
Updated•4 years ago
|
Assignee | ||
Comment 5•4 years ago
|
||
Assignee | ||
Comment 6•4 years ago
|
||
There appear to be some test failures. I started some retriggers. If they're legit they probably have to do with sampling multiple times with the same test timestamp.
Assignee | ||
Comment 7•4 years ago
|
||
They seem legit. Here's another try push with some better handling of the test timestamp: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=07f176db1ceb112058bca5c0534cfafda7ddec3c
Assignee | ||
Comment 8•4 years ago
|
||
I reproduced the test failure locally and think I understand what's going on. The problem is that the transaction that's filled out by APZ has both scroll offsets and other animation transforms (in particular for scrollbars and zoom and stuff). And those properties need to be set unconditionally, otherwise the pending properties
don't match the current properties
here and the properties effectively get erased from the frame build. This can result in mispositioned scrollbars and such.
This kind of throws a wrench in the works because it means we need to emit sampling data for some previous timestamp that we no longer still have data for. I'm going to think about this some more.
Assignee | ||
Comment 9•4 years ago
|
||
I came up with something that I thought would work, but upon further consideration I think it will regress the APZ frame delay mechanism. It will still frame-delay animations, but not user input. Anyway I kicked off a try push with that here while I think of other options.
Assignee | ||
Comment 10•4 years ago
|
||
Not really making a lot of progress here. It seems like there's an assumption baked into the frame-delay implementation that we only sample from APZ (via ApplyAsyncContentTransformToTree
or SampleForWebRender
) once per vsync interval, so it's safe to resample into the mComposited*
fields after that happens, in order to set up for the next vsync interval. I think this assumption holds for the layers codepath but not the WR codepath. It doesn't seem possible to rearrange the existing blocks of code to make this work so I might have to think of an alternative implementation for the frame-delay mechanism that doesn't make this assumption.
Assignee | ||
Comment 11•4 years ago
|
||
Currently, the flow is something like this (assuming t0, t1, t2 are evenly spaced vsync times):
- Vsync (at time t=t1)
- Do sampling of APZ transforms (top part of
SampleForWebRender
or inApplyAsyncContentTransformToTree
for non-WR). This emits values that include animations at t=t0 and user input at the last call toSampleCompositedAsyncTransform
which was also roughly at t=t0. - Attempt to advance APZ animations to time t=t2. This trickles down to
AsyncPanZoomController::UpdateAnimation
which does:- Call
SampleCompositedAsyncTransform
, snapshots async state from animations at t=t1 and all received user input into themComposited*
fields. - Advance animations to time t=t2
- Call
If we loop this over and over once per vsync everything works out consistently. The problem is if we get a second sampling (i.e. call to SampleForWebrender
) within the same vsync interval. Let's say that happens within the t=t1 vsync interval described above. Then:
- Do sampling of APZ transforms. This emits values that include animations at t=t1 and user input at the last call to
SampleCompositedAsyncTransform
which was earlier in this vsync interval. - Attempt to advance APZ animations to time t=t2. This trickles down to
AsyncPanZoomController::UpdateAnimation
which does:- Early exit since timestamp is the same as the last time
So the result of the sampling is different in that it includes more recently sampled state. And then on the next vsync interval this happens:
- Vsync (at time t=t2)
- Do sampling of APZ transforms (top part of
SampleForWebRender
or inApplyAsyncContentTransformToTree
for non-WR). This emits values that include animations at t=t1 and user input at the last call toSampleCompositedAsyncTransform
which was partway between t1 and t2. - Attempt to advance APZ animations to time t=t3. This trickles down to
AsyncPanZoomController::UpdateAnimation
which does:- Call
SampleCompositedAsyncTransform
, snapshots async state from animations at t=t2 and all received user input into the mComposited* fields. - Advance animations to time t=t3
- Call
And so the result emitted here is the same as the result for the second sampling in the previous vsync interval. This produces visual jank because we update two frames' worth in a single frame and then "skip" the next frame.
The constraints here are that (1) in order to allow for multiple calls to SampleForWebRender
within a vsync interval, we need to preserve the mComposited*
values all the way through to the end of the vsync interval, rather than clobbering them at the end of the first call to SampleForWebRender
. And that (2) in order to respect frame-delay, we want to snapshot the Metrics()
into the mComposited*
fields pretty early in the vsync interval. If we do it late, then we're capturing user input that arrived later and defeating the frame-delay mechanism. These constraints cannot be simultaneously satisfied while only having one set of mComposited*
fields, so I wrote a set of patches that encapsulates those fields and allows an AsyncPanZoomController to have another one. This way it can snapshot the async state for the next frame while still continue providing the older snapshot during the current vsync interval.
Note that I had considered other simplifications (such as just caching the values emitted by SampleForWebRender) but it's possible to get a layers update between two samplings within a single vsync interval, and handling that case properly defeats those simplification attempts (because in that case the second sampling might have a different set of APZCs, and the async transform may have been reset, etc.).
Here's a try push with the patches: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=536c7281c179a958215858257284153fadac242d - they have the patches from bug 1627716 as a dependency as deleting the dynamic toolbar also removes some fiddling of the mComposited*
fields.
Assignee | ||
Comment 12•4 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&selectedJob=298879406&revision=9af24ef0cc11d3acb29d53e762bb879e3f6c5ae8 is looking better, had a silly error in the last try push. There's still one test failure here but I believe that's a bug in the test. Will get the patches ready for review soon.
Assignee | ||
Comment 13•4 years ago
|
||
Having to think about multiple codepaths adds complexity and it doesn't seem
like we're going to turn this pref back off anytime soon.
Assignee | ||
Comment 14•4 years ago
|
||
No functional change, since everything was safe already. But this propagates
the proof-of-lock a little further to make it more obvious that fields are
safely accessed.
Depends on D72040
Assignee | ||
Comment 15•4 years ago
|
||
No functional changes intended here, just code getting moved into a helper
class. Note that this patch folds RecalculateCompositedLayoutViewport into
ClampCompositedScrollOffset since there are no longer any independent callers
of the former function (as of bug 1627716).
Depends on D72041
Assignee | ||
Comment 16•4 years ago
|
||
The deque always has size 1, so this patch is functionally a no-op. It sets up
the usage of front() and back() to allow holding more than one item in a future
patch.
Depends on D72042
Assignee | ||
Comment 17•4 years ago
|
||
Again, functionally this is a no-op since instead of replacing the back()
element, we do an emplace_back() followed by a pop_front().
Depends on D72043
Assignee | ||
Comment 18•4 years ago
|
||
We have had to apply this same fix in other tests in the past, for example in
helper_hittest_basic.html.
Depends on D72044
Assignee | ||
Comment 19•4 years ago
|
||
Refer to bug 1630781 comment 11 for a more detailed explanation of the problem.
A quick summary is that prior to this patch, the sampling operation involved
these steps:
- sample stored APZ state for composition
- advance animations to next vsync
- update stored APZ state to the one just computed
When sampling occured twice within a vsync interval, the stored APZ state
calculated at the end of the first sampling would be produced in the second
sampling, resulting in non-smooth scrolling. Note that the second and third
steps would only run once per vsync interval, but that was sufficient to cause
the problem.
With this patch, the order of the steps is reordered:
- update stored APZ state to that computed in the last vsync interval
- advance animations to next vsync and save it in the queue
- sample stored APZ state for composition
With this ordering (and with the first two steps only running once per vsync
interval), the third step now emits consistent data even if the steps are run
multiple times in a vsync interval. It does mean that the mSampledState deque
can have up to two items in it (front() being the state for the current vsync
interval, and back() being the state computed for the next vsync interval).
Although this problem only affected the WR codepath in practice, it could in
theory occur with the non-WR codepath too, if we for some reason ran the
TransformShadowTree code multiple times in a vsync interval. This patch updates
both codepaths with the new order of steps.
Depends on D72045
Updated•4 years ago
|
Comment 20•4 years ago
|
||
Comment 21•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/0e01cc6c7ba9
https://hg.mozilla.org/mozilla-central/rev/a0a7f8bd9b39
https://hg.mozilla.org/mozilla-central/rev/e83a1f37cafd
https://hg.mozilla.org/mozilla-central/rev/84d718cc1c95
https://hg.mozilla.org/mozilla-central/rev/bbce34f0afe3
https://hg.mozilla.org/mozilla-central/rev/a9d62cd638a6
https://hg.mozilla.org/mozilla-central/rev/a8048daebd7e
Assignee | ||
Updated•4 years ago
|
Description
•