Closed Bug 1437036 Opened 2 years ago Closed Last year

WebRenderLayerManager doesn't override GetLastTransactionId

Categories

(Core :: Graphics: WebRender, defect, P1)

Other Branch
defect

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox62 --- fixed

People

(Reporter: kats, Assigned: kats)

References

Details

(Whiteboard: [gfx-noted])

Attachments

(2 files)

Assignee: nobody → bugmail
Looks like implementing this function makes some reftests fail/hang/result in weird behaviour. So fixing it isn't trivial.
I did sniff a try which has a patch for this bug.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e5bee2fcb151be24409585705a07965359dc02a6

As far as I can tell, all reftest failures are related to bug 1419851.  And also mochitest-1 and mochitest-5 failures are definitely due to bug 1419851.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #1)
> Looks like implementing this function makes some reftests fail/hang/result
> in weird behaviour. So fixing it isn't trivial.

I looked into another MozAfterPaint's problem of a WebRender. It seems like that WebRender will not send correct NotifyDidRender if the content has an inner frame. The mochitest which is by using iframe is failed if I removed the timer of EnsureEventualDidPaintEvent (which is firing the MozAfterPaint in spite of the situation where gecko didn't end up render).

I tracked the several functions (RecvSetDisplay/RecvEmptyTransaction/NotifyDidRender/DidComposite), then gecko received same epoch number via NotifyDidRender and gecko doesn't get the iframe's epoch number.

-------------------------------------------------------------------------
 Logging result:
  1) SendDisplayList (Epoch:31 / Pipline namespace: 1 / Pipline handle:1) <- parent document.
  2) SendDisplayList (Epoch:3  / Pipline namespace: 1 / Pipline handle:2) <- frame document
  3) NotifyDidCompositeToPipeline (Epoch:31 / Pipline namespace: 1 / Pipline handle:1)    <- parent document.
  4) NotifyDidCompositeToPipeline (Epoch:31 / Pipline namespace: 1 / Pipline handle:1)    <- ????
-------------------------------------------------------------------------

I guess that something is wrong with the numbering of NotifyDidCompositeToPipeline.

The WebRenderBridgeParent manages this handling number own. In any case, this is another problem with it. I'll investigate this problem.
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #3)
> I guess that something is wrong with the numbering of
> NotifyDidCompositeToPipeline.
> 
Ah, This comment is wrong. In my environment(Ubuntu 17 on XPS13), test is failed just after launching the browser. If I open the new tab and go to mochitest's URL, test will success. It looks like that first window render will fail.

kats,
Do you know this phenomenon? (i.e. mochitest fail just after launching the browser.)
As far as I can see, gecko doesn't call the PaintRoot from content process, then render_backend didn't process anything.
Flags: needinfo?(bugmail)
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #4)
> kats,
> Do you know this phenomenon? (i.e. mochitest fail just after launching the
> browser.)
> As far as I can see, gecko doesn't call the PaintRoot from content process,
> then render_backend didn't process anything.

No, I haven't seen this. The way I usually run mochitests with webrender enabled on Linux is like so:
  MOZ_ACCELERATED=1 MOZ_WEBRENDER=1 ./mach mochitest <whatever> --keep-open=false
and that usually works well for me. If I want to run in headless mode I use xvfb-run with specific arguments [1]. Botond recently was having issues when he tried running mochitests in xvfb-run and we discovered that the screen depth argument is important because otherwise gecko doesn't find the right GL context to use for webrender and it causes badness.

[1] https://github.com/staktrace/moz-scripts/blob/master/xr#L3
Flags: needinfo?(bugmail)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #5)
> (In reply to Mantaroh Yoshinaga[:mantaroh] from comment #4)
> > kats,
> > Do you know this phenomenon? (i.e. mochitest fail just after launching the
> > browser.)
> > As far as I can see, gecko doesn't call the PaintRoot from content process,
> > then render_backend didn't process anything.
> 
> No, I haven't seen this. The way I usually run mochitests with webrender
> enabled on Linux is like so:
>   MOZ_ACCELERATED=1 MOZ_WEBRENDER=1 ./mach mochitest <whatever>
> --keep-open=false
> and that usually works well for me. If I want to run in headless mode I use
> xvfb-run with specific arguments [1]. Botond recently was having issues when
> he tried running mochitests in xvfb-run and we discovered that the screen
> depth argument is important because otherwise gecko doesn't find the right
> GL context to use for webrender and it causes badness.
> 
> [1] https://github.com/staktrace/moz-scripts/blob/master/xr#L3

Thanks.
Webrender will not remove the 'blank' attribute of tabbrowser[1]. WebRender will not send initial MozAfterPaint(transaction id < 4) because the backend_render decided that has not rendering or scroll(SetDisplayList). As result of it, gecko doesn't process the initalization of tabbrowser, and gecko never remove the 'blank' attribute from tabbrowser.xml[2]. So webrender doesn't rendering anything. Before applying the patches of bug 1419226, gecko work this code due to MozAfterPaint of timer.

[1] https://searchfox.org/mozilla-central/rev/9a8d3f8191b15cdca89a7ce044c7bea2dd0462dc/browser/base/content/tabbrowser.xml#34
[2] https://searchfox.org/mozilla-central/rev/9a8d3f8191b15cdca89a7ce044c7bea2dd0462dc/browser/base/content/browser.js#1592
I thought about this a bit more. In all four of the failing cases, the animation is long-lasting (or infinite) and the lack of invalidation in WR is what causes us to go into an infinite repainting cycle. Non-WR detects that there is no visible change and so skips painting, which allows the reftest to end. In a way these tests are relying on optimizations within gecko in order to test for correct behaviour, which seems less than ideal. One thought I had was that in the test, we can set the animationPlayState to paused at the same time that we remove the reftest-wait. This should freeze the animation explicitly at that point, so even WR will stop repainting and the reftest snapshot can proceed.

This seems to work for me locally for two of the tests (transform/table-overflowed-by-animation.html and transform-3d/animate-preserve3d-parent.html) but the other two (css-animations/continuation-opacity.html and transform-3d/animate-preserve3d-child.html) still don't halt. I added some logging and the test never even gets to the code I added to pause the animations; it looks like the MozReftestInvalidate event never fires in the case of continuation-opacity.html and the animationiteration event never fires in the other case. I'm not yet sure why but I can continue looking into this tomorrow.
As for continuation-opacity.html, we shouldn't change play state during the test because the test specifies 'reftest-no-flush' which means the test expects there is no force flushes happen.  Whereas, unfortunately, changing play state causes a style flush.  That's said, I can't reproduce the failure locally.  It passes with the patch, weird, it seems to be a perma on the try.
OK, I can reproduce the failure locally, it needs to specify MOZ_REFTEST_VERBOSE=1, that means that sendSyncMessage() flushes queued something that affects MozAfterPaint events?
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e44babffae26456d572a3a6f24a7e6b0393be053 is a try push where I paused the animations. All four of the tests passed, but revealed two other failures. I started another try push where I pause the animations in those as well: https://treeherder.mozilla.org/#/jobs?repo=try&revision=60742666446beb670d7dfe8d1afaad2d7313ef21

But yeah, if pausing the animation forces a flush and that's undesirable then that approach might not be viable.

I was also only able to repro the failure locally after specifying MOZ_REFTEST_VERBOSE=1, but I assumed it was just timing-related.
My new plan (to avoid the flushing problem) is to explicitly add a reftest-ignore-pending-paints class to the root element and have the harness check for that to ignore the pending paints. Try push with that: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4e14c8e242877dafef54cf8c652ead9f928ca767
Better try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e35a570c6a881f93eea206e354ec2a469b39262c (I screwed up the previous one, so I cancelled it).

In my local testing the only other problem I ran into was specific to the continuation-opacity test, which starts the animation and then waits for MozReftestInvalidate. As described at https://bugzilla.mozilla.org/show_bug.cgi?id=1447874#c0 we never get the MozReftestInvalidate for this test because of the animation. So I changed the test to wait for the load event instead of MozReftestInvalidate which should be pretty much equivalent for this case. It seems to be the only test that does this, but let's see how the try push turns out.
Seems good. Hiro, what do you think - is this a reasonable approach?
Ugh. Looks like for some reason transform-3d/animate-backface-hidden.html is failing with the patches but only on Windows. No idea why...
Comment on attachment 8972899 [details]
Bug 1437036 - Stop the reftest harness from waiting for MozAfterPaint during infinite/superlong animations.

Dropping flag until I figure out something that works.
Attachment #8972899 - Flags: review?(hikezoe)
I have opened bug 1459388 for continuation-opacity.html case, and now I suspect table-overflowed-by-animation.html is a similar case, inconsistent frame use.
Note that bug 1320608 should fix table-overflowed-by-animation.html failure.  I have no idea about the other two (animate-preserve3d-parent.html and animate-preserve3d-child.html) yet.  I am fine with workarounds for these two test cases for now.
I am wondering why table-overflowed-by-animation.html and continuation-opacity.html still need the workaround?
We probably don't need it for those tests. I just didn't remember which of your patches had landed and which were in my local tree so I left those in just to make sure. I just rebased and checked those two tests locally without the workaround - they pass. I'll do another try push to confirm but I think the patches are good to go, will put them up for review shortly.
Comment on attachment 8972900 [details]
Bug 1437036 - Implement GetLastTransactionId in WebRenderLayerManager.

https://reviewboard.mozilla.org/r/241452/#review249448
Attachment #8972900 - Flags: review?(hikezoe) → review+
Comment on attachment 8972899 [details]
Bug 1437036 - Stop the reftest harness from waiting for MozAfterPaint during infinite/superlong animations.

https://reviewboard.mozilla.org/r/241450/#review249452

This change looks reasonable to me (though I am pretty sure ib-split-sibling-opacity.html should be work as it is since it has the same characteristics continuation-opacity.html), but unfortunately animate-preserve3d-child.html and animate-preserve3d-parent.html still failed.
Comment on attachment 8972899 [details]
Bug 1437036 - Stop the reftest harness from waiting for MozAfterPaint during infinite/superlong animations.

https://reviewboard.mozilla.org/r/241450/#review249454

The change looks reasonable to me.  (though I am pretty sure ib-split-sibling-opacity.html should work as it is, since it has the same characteristics of continuation-opacity.html.)  But unfortunately animate-preserve3d-child.html and animate-preserve3d-parent.html still failed on Win64 in the try.  So I am pending this review.

::: layout/tools/reftest/reftest-content.js:437
(Diff revision 2)
> +           contentRootElement.getAttribute('class').split(/\s+/)
> +                             .includes("reftest-ignore-pending-paints");

Could we add a check that the flag should be used for documents that have animations.  It would be something like this.

if (reftest_ignore_pending_paints_is_specifed &&
    contentRootElement.getAnimations().length != 0) {
  LogError("`reftest-ignore-pending-paints` should only specify for document with animations");
}

Note that reftest harness doesn't have LogError yet, we have to add it too though.
Do'h, I thought I have lost comment 29 somewhere, but actually it was posted. :)
There's a few problems still. One is that I need to add another rAF between the MozAfterPaint and clearing the reftest-wait, so that it actually advances consistently to the frame that we want to snapshot.

Another problem with animate-preserve3d-child.html is that it doesn't even get past the STATE_WAITING_TO_FIRE_INVALIDATE_EVENT. The animations start out running and never stop. Other tests that do this have reftest-no-flush but this one does not. Is it acceptable to add reftest-no-flush to this test?

The -parent.html test is better because the animation starts paused and then it explicitly switches the animation to the running state, but there is still a race condition here - if the animationstart event fires before the reftest gets past the WAITING_TO_FIRE_INVALIDATE_EVENT state, then it will get stuck as well. In this test we can fix this by waiting for both the MozReftestInvalidate and animationstart events, and only make the animation run after we get both. Or we can use reftest-no-flush. What do you think?
Note also that animate-backface-hidden already has reftest-no-flush so it doesn't have that second problem.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #32)
> There's a few problems still. One is that I need to add another rAF between
> the MozAfterPaint and clearing the reftest-wait, so that it actually
> advances consistently to the frame that we want to snapshot.
> 
> Another problem with animate-preserve3d-child.html is that it doesn't even
> get past the STATE_WAITING_TO_FIRE_INVALIDATE_EVENT. The animations start
> out running and never stop. Other tests that do this have reftest-no-flush
> but this one does not. Is it acceptable to add reftest-no-flush to this test?
> 
> The -parent.html test is better because the animation starts paused and then
> it explicitly switches the animation to the running state, but there is
> still a race condition here - if the animationstart event fires before the
> reftest gets past the WAITING_TO_FIRE_INVALIDATE_EVENT state, then it will
> get stuck as well. In this test we can fix this by waiting for both the
> MozReftestInvalidate and animationstart events, and only make the animation
> run after we get both. Or we can use reftest-no-flush. What do you think?

Yeah, using reftest-no-flush in those tests is fine.  For reference, animations in those tests don't run on the compositor yet (bug 779598), so reftest-no-flush shouldn't affect the tests.  Once bug 779598 is fixed, we can also drop 'reftest-no-flush' at that time.

And your comments noticed me that the real cause of these failures is probably SchedulePaint call [1] in AddAnimationsForProperty().  The SchedulePaint call leads to marking isMozAfterPaintPending (nsRefreshDriver::mViewManagerFlushIsPending) true during painting process, thus when we check isMozAfterPaintPending in a callback of setTimeout, the value always is true.  So there is no chance that isMozAfterPaintPending is false.

[1] https://hg.mozilla.org/mozilla-central/file/a7461494a7a0/layout/painting/nsDisplayList.cpp#l642
Comment on attachment 8972899 [details]
Bug 1437036 - Stop the reftest harness from waiting for MozAfterPaint during infinite/superlong animations.

https://reviewboard.mozilla.org/r/241450/#review249480

r=me.  On Android two reftests still fail on the try but I think we can disable them and land these fixes for now.
Attachment #8972899 - Flags: review?(hikezoe) → review+
(In reply to Hiroyuki Ikezoe (:hiro) from comment #38)
> Comment on attachment 8972899 [details]
> Bug 1437036 - Stop the reftest harness from waiting for MozAfterPaint during
> infinite/superlong animations.
> 
> https://reviewboard.mozilla.org/r/241450/#review249480
> 
> r=me.  On Android two reftests still fail on the try but I think we can
> disable them and land these fixes for now.

Ok, thanks. It's just the one reftest that's still failing on android, animate-preserve3d-child.html. I did some more try pushes over the weekend and discovered that if I remove the MozAfterPaint wait that fixes it on Android but breaks it on Linux. So yeah, I'd rather just skip the test on Android and file a follow-up bug to deal with that.
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7914da4c2831
Stop the reftest harness from waiting for MozAfterPaint during infinite/superlong animations. r=hiro
https://hg.mozilla.org/integration/autoland/rev/37746bbf1d13
Implement GetLastTransactionId in WebRenderLayerManager. r=hiro
https://hg.mozilla.org/mozilla-central/rev/7914da4c2831
https://hg.mozilla.org/mozilla-central/rev/37746bbf1d13
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
You need to log in before you can comment on or make changes to this bug.