Re-enable layout/reftests/bugs/593243-1.html (aka fix invalidation detection in WebRenderLayerManager)

RESOLVED FIXED in Firefox 58

Status

()

enhancement
P1
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: kats, Assigned: ethlin)

Tracking

(Blocks 1 bug)

Other Branch
mozilla58
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox58 fixed)

Details

(Whiteboard: [wr-mvp] [gfx-noted])

Attachments

(3 attachments)

Right now layout/reftests/bugs/593243-1.html is marked as skip-if(webrender). This bug is to re-enable it.

The reason it's marked skip-if is that it intermittently times out during load, waiting for the reftest-wait to be removed. I can reproduce this locally if I run the reftest repeatedly (e.g. with --run-until-failure --repeat 50) under rr chaos mode.

I've been chasing down the problem. What seems to be happening is that after the test case modifies the style property of the 'box' element [1] it never gets the MozAfterPaint event it's waiting for to continue.

The MozAfterPaint event in turn seems to never get fired because the mFireAfterPaintEvents flag in the presContext is false when the composite finishes. That, in turn, is false because we never set an invalidation rect, because the WebRenderLayerManager::ShouldNotifyInvalidation() call at [2] returns false.

I'm trying to figure out why ShouldNotifyInvalidation returns false, because there's clearly a change in the style property that should mark stuff invalid.

[1] http://searchfox.org/mozilla-central/rev/f54c1723befe6bcc7229f005217d5c681128fcad/layout/reftests/bugs/593243-1.html#17
[2] http://searchfox.org/mozilla-central/rev/f54c1723befe6bcc7229f005217d5c681128fcad/layout/painting/nsDisplayList.cpp#2190
Also, the "intermittent" nature of this problem is because this only happens when the refresh driver is ticked between the call to ScheduleViewManagerFlush() and a subsequent call to GetIsMozAfterPaintPending(). In most cases the GetIsMozAfterPaintPending() call happens soon after the ScheduleViewManagerFlush() call, so that results in a dummy zero-size rect getting set as the invalidation rect, and that allows the MozAfterPaint event to get fired once the composition is complete. If the refresh driver is ticked before the GetIsMozAfterPaintPending() call happens, then the mViewManagerFlushPending flag is cleared as part of the paint and so the subsequent call to GetIsMozAfterPaintPending() does *not* set the zero-size invalidation rect. This is the problematic scenario.
Some discussion with mattwoodrow below. It boils down to the fact that just calling IsInvalid on each display item is going to miss some kinds of invalidation, and we'd have to use DLBI code to catch it. Alternatively we could go back Morris' original prototype which does a memcmp on the built display list which is less code.


16:45:09 kats: mattwoodrow: so right now in WebRenderLayerManager this is how we detect "invalidation": http://searchfox.org/mozilla-central/rev/f54c1723befe6bcc7229f005217d5c681128fcad/gfx/layers/wr/WebRenderLayerManager.cpp#332-338 - as we iterate over each display item, we call IsInvalid for it and set a flag. Is this sufficient, or will it miss cases? (in particular it seems to be missing the invalidation that should be generated by this: http://searchfox.org/mozilla-central/rev/f54c1723befe6bcc7229f005217d5c681128fcad/layout/reftests/bugs/593243-1.html#17
16:47:00 mattwoodrow: kats: You are completely correct, it’ll miss that case
16:47:17 kats: mattwoodrow: how should we catch that (and any other missing cases)?
16:47:18 mattwoodrow: IsInvalid() only picks up painting style changes
16:47:44 mattwoodrow: kats: You need to use the DLBI code
16:48:14 mattwoodrow: http://searchfox.org/mozilla-central/rev/f54c1723befe6bcc7229f005217d5c681128fcad/layout/painting/FrameLayerBuilder.cpp#4554
16:48:26 kats: uh-oh, FrameLayerBuilder
16:49:49 kats: mattwoodrow: so for WR we don't need a precise rect, we just need to know if something changed
16:50:12 kats: mattwoodrow: can we use a simplified version of DLBI based on that?
16:50:40 mattwoodrow: kats: You’ll still need most of the logic from that function
16:51:13 kats: mattwoodrow: would it be hard to extract the logic so it doesn't depend on layers-specific things and FLB?
16:52:23 kats: mattwoodrow: all we have in WebRenderLayerManager is a display list and a display list builder. no FLB or layers
16:52:57 mattwoodrow: kats: I don’t think it’d be too hard
16:53:08 mattwoodrow: It looks like the only FLB member we use there is mDisplayListBuilder
16:53:37 kats: where do the DisplayItemData things come from?
16:54:53 mattwoodrow: They’re the retained state we keep between paints
16:55:16 mattwoodrow: stores a blob of data about the display item (for the comparison) as well as what the clip was, and what layer we got assigned to
16:55:45 mattwoodrow: It seems easiest for WR to keep generating those and just store a null Layer
16:57:41 kats: mattwoodrow: back when the invalidation-detection code for WR was added, morris had a different approach in a WIP where he basically did a memcmp on the prev/current serializations of the display list. do you think that approach might be viable?
16:57:47 kats: https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=1391202&attachment=8898224
16:59:30 mattwoodrow: If you only care about a boolean result, then yeah, seems like it would
16:59:41 mattwoodrow: This is going to break a lot of mochitests haha
17:00:07 kats: mattwoodrow: good thing we're not running those mochitests yet! :)
17:00:17 kats: mattwoodrow: ok i'll give it some more thought. good to know what the options are
17:00:20 mattwoodrow: Tomorrow-kats’ problem!
17:00:27 kats: yeah no kidding
17:01:10 mattwoodrow: Most of the mochitests that use the invalidation rectangles are doing so as a proxy for a performance test
17:01:41 mattwoodrow: Assuming that that fundamentally won’t work with WR, we might need an alternative way to test that those things still perform well
17:02:15 kats: ok, good to know
Blocks: 1391202
Summary: Re-enable layout/reftests/bugs/593243-1.html → Re-enable layout/reftests/bugs/593243-1.html (aka fix invalidation detection in WebRenderLayerManager)
I tried to get a better handle on the high level flow here. This is my current understanding of how things happen:

Non-WebRender:
1) As stuff in the DOM/style changes, SchedulePaint gets called on frames. Some of the SchedulePaint calls request paints [1] and others do not. These SchedulePaints can be "false positives" in that they might request a paint when in fact nothing has visibly changed.
2) At the refresh driver tick, we check to see if any of the flags that require painting are set [2]. If not, we try to do an empty transaction. If the empty transaction is successful, we're done.
3) If we have painting flags, or the empty transaction fails, we start a full transaction.
4) We build the display list, run DLBI, repaint things that are needed, and push them over to the compositor.
5) If we had invalidation and a non-empty invalid rect we then fire a MozAfterPaint event. Otherwise, we don't.

WebRender:
1) Same as (1) above
2) Same as (2) above, except with layers-free WR the empty transaction is *never* successful [3].
3) We always start a full transaction.
4) We build the display list, paint everything (i.e. build the WR display list), and push it over to the compositor side.
5) Now we call IsInvalid() on each display item, and if anything returns true, we fire a MozAfterPaint event. Otherwise, we don't.

So there's two problems here. One is that we don't support empty transactions with WR which means we're doing a lot more work than we need to. That's bug 1403176, and can be resolved separately. The other is that because we're not running the full DLBI algorithm, we don't have an accurate picture of what changed and what didn't. This manifests in two ways:

(a) in the cases where DLBI would have detected "nothing changed" and avoided a paint, we go ahead and do a paint anyway. This happens in cases where SchedulePaint has been called with flags that request a paint, but nothing actually changed, and DLBI was able to filter it out.
(b) in some cases where DLBI would have detected "something changed" and fired a MozAfterPaint, our calls to IsInvalid return false, so we don't fire the MozAfterPaint event, even though we actually did paint the update correctly. That's what's causing the reftest-wait timeout in the 593243-1.html reftest.

On the call today :jrmuizel said he was opposed to pulling in DLBI, but I'm not sure if we have any other way of fixing problem (a). We could just not fix it and live with a certain amount of overpainting. I'm not sure how much that would be. Problem (b) could be fixed by the memcmp approach. We might have other options as well.

[1] http://searchfox.org/mozilla-central/rev/f54c1723befe6bcc7229f005217d5c681128fcad/layout/generic/nsFrame.cpp#6446
[2] http://searchfox.org/mozilla-central/rev/f54c1723befe6bcc7229f005217d5c681128fcad/layout/base/PresShell.cpp#6379,6386-6388
[3] http://searchfox.org/mozilla-central/rev/f54c1723befe6bcc7229f005217d5c681128fcad/gfx/layers/wr/WebRenderLayerManager.cpp#187
Status: NEW → ASSIGNED
Priority: P3 → P1
Whiteboard: [gfx-noted][wr-mvp][triage] → [wr-mvp] [gfx-noted]
Priority: P1 → --
Whiteboard: [wr-mvp] [gfx-noted] → [gfx-noted]
Priority: -- → P1
Whiteboard: [gfx-noted] → [wr-mvp] [gfx-noted]
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #3)
> On the call today :jrmuizel said he was opposed to pulling in DLBI, but I'm
> not sure if we have any other way of fixing problem (a). We could just not
> fix it and live with a certain amount of overpainting. I'm not sure how much
> that would be. Problem (b) could be fixed by the memcmp approach. We might
> have other options as well.

So it seems like the easiest thing to do for now is to always fire MozAfterPaint everytime we paint. i.e. ignoring the calls to IsInvalid. This will make it so we have no filters to prevent painting, which isn't great but we can probably see how big a problem that is in practice and craft an approach to address it when it comes up.
I suspect if we fire that event unconditionally we might very well run into infinite painting loops. All it would take is one MozAfterPaint listener somewhere that triggers another normally-a-no-op SchedulePaint() call of any kind. But I'll try it and see what happens. We should probably fix the EndEmptyTransaction case before landing this change, assuming it's landable.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #5)
> I suspect if we fire that event unconditionally we might very well run into
> infinite painting loops. All it would take is one MozAfterPaint listener
> somewhere that triggers another normally-a-no-op SchedulePaint() call of any
> kind. But I'll try it and see what happens. We should probably fix the
> EndEmptyTransaction case before landing this change, assuming it's landable.

Note that the reftest harness is an example of this, since it calls DrawWindow in response to MozAfterPaint (and DrawWindow can/will end up firing another MozAfterPaint).
Matt, do you know what the status is on retained display lists? I'm wondering if getting that in place would make it easier to a sort of simplified DLBI by just flagging any changes to the retained display list on the next paint.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #6)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=08ef88d181c65e10975e0d890947d226e7763d7a

So yeah, this has a lot of failures because we keep infinitely painting and there's always a paint pending so the reftest harness never gets to take a snapshot.
I had a wip in bug 1394336, but I couldn't reproduce the problem in bug 1394336. According to bug 1394336 comment 8, the wip doesn't work for the reflow case.
I just reworked the wip to save the style change flag in the nsPresContext, so it can cover the reflow case. Based my local test, it works for ex-unit-1-dynamic.html, though I couldn't reproduce 593243-1.html locally. Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c34f9a00647979c5e7bd5b746c5fd44d25943e75&selectedJob=134419041

There is a failure (745025-1.html) in the try push. The testcase is about canvas. I think we may miss something when detecting the canvas changes.
Posted patch wipSplinter Review
Matt, does this updated WIP ^ address the concern you had in bug 1394336 comment 8?
Flags: needinfo?(matt.woodrow)
I think it does!

Rather than adding something new to presshell, can we just check to see if anyone scheduled a viewmanager flush on the refresh driver?

If nobody has scheduled a paint (and we're painting a DrawWindow/PresShell:RenderDocument call), then there must not be any changes since last paint.
Flags: needinfo?(matt.woodrow)
(In reply to Matt Woodrow (:mattwoodrow) from comment #13)
> I think it does!
> 
> Rather than adding something new to presshell, can we just check to see if
> anyone scheduled a viewmanager flush on the refresh driver?
> 
> If nobody has scheduled a paint (and we're painting a
> DrawWindow/PresShell:RenderDocument call), then there must not be any
> changes since last paint.

I tested this idea and the result[1] looks good, at least same as my original approach. And we can remove the original invalidation calculations.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=5fe74c321fea7c967ca503912791572fec107e92&selectedJob=134584076
In my patch, I also remove some random-if annotations in R8. I still need to create a new boolean variable since we always reset 'mViewManagerFlushIsPending' before painting. kats, please feel free to have another patch if you think it's not a good way to fix this problem.

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0e4d8799b2da37e46fa14ad32383766502396ce5&selectedJob=134616657
Comment on attachment 8914700 [details]
Bug 1404091 - In layers-free mode, we should do NotifyInvalidation after EndTransaction if there is any scheduled flush.

https://reviewboard.mozilla.org/r/186006/#review191086

This looks ok to me, thanks!
Attachment #8914700 - Flags: review?(bugmail) → review+
Comment on attachment 8914701 [details]
Bug 1404091 - Re-enable some skip-if and random-if tests after the fix.

https://reviewboard.mozilla.org/r/186008/#review191090

I did a try push with these patches and retriggered the reftest jobs a bunch just to make sure they're solid: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9417fbf688eabf42a0bed87bfa973fb579f8ee3f - looking good so far. I can autoland these patches once all the jobs are done.
Attachment #8914701 - Flags: review?(bugmail) → review+
Assignee: bugmail → ethlin
We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again.

hg error in cmd: hg rebase -s 7b0aa3c021f0 -d 89ffef35daad: rebasing 423869:7b0aa3c021f0 "Bug 1404091 - In layers-free mode, we should do NotifyInvalidation after EndTransaction if there is any scheduled flush. r=kats"
merging gfx/layers/wr/WebRenderLayerManager.cpp
merging gfx/layers/wr/WebRenderLayerManager.h
merging layout/painting/nsDisplayList.cpp
warning: conflicts while merging gfx/layers/wr/WebRenderLayerManager.h! (edit, then use 'hg resolve --mark')
unresolved conflicts (see hg resolve, then hg rebase --continue)
Pushed by ethlin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6016270b0ddd
In layers-free mode, we should do NotifyInvalidation after EndTransaction if there is any scheduled flush. r=kats
https://hg.mozilla.org/integration/autoland/rev/deeb7f472519
Re-enable some skip-if and random-if tests after the fix. r=kats
https://hg.mozilla.org/mozilla-central/rev/6016270b0ddd
https://hg.mozilla.org/mozilla-central/rev/deeb7f472519
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
You need to log in before you can comment on or make changes to this bug.