Closed Bug 1419226 Opened 2 years ago Closed 2 years ago

Remove the timer of MozAfterPaint which brings intermittent test fail of animation.

Categories

(Core :: Layout, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- fixed

People

(Reporter: mantaroh, Assigned: mantaroh)

References

Details

Attachments

(2 files, 3 obsolete files)

Several tests in animation mochitest uses paint_listener for waiting paint event.[1][2] This paint listener doesn't use transactionId now(for detail, see bug 1264409), so this brings several problems for animation tests.

 * Test might success even if received the wrong paint event.
 * If compositor is delay, paint event of software timer version[2] fired, and test of related with compositor will fail.

This bug will introduce transaction id for paint listener in order to use it on animation tests.
This patch added transaction id into animation tests:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8040299be0b41f1109d9202d88dd295d9a99fa1

However this patch will not fix an intermittent bug(bug 1395971), I need to work following task in order to fix bug 1395971 after landing this:

 * Delete the timer of ensuring eventual did paint event.
 * Modify tests which use the paint listener(i.e. should use transaction id).
   * Especially,  several APZ and Scroll tests fail due to doesn't occur a painting.
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #0)
> Several tests in animation mochitest uses paint_listener for waiting paint
> event.[1][2] This paint listener doesn't use transactionId now(for detail,
> see bug 1264409), so this brings several problems for animation tests.
> 
>  * Test might success even if received the wrong paint event.
>  * If compositor is delay, paint event of software timer version[2] fired,
> and test of related with compositor will fail.
> 
> This bug will introduce transaction id for paint listener in order to use it
> on animation tests.

Ah, Sorry for confusion. Bug 1302071 has already fixed the bug 1341294[1]. So I think htat we might not need to check the transaction id on paint_listener.js. I guess the root cause of intermittent test fail(bug 1395971) is software timer of MozAfterPaint. So I will work on it in this bug.


[1] https://hg.mozilla.org/mozilla-central/rev/a535bf3bfff1#l2.170
If that's the case, please mark bug 1341294 as a dupe of bug 1302071 linking to the relevant changeset.
(In reply to Brian Birtles (:birtles) from comment #3)
> If that's the case, please mark bug 1341294 as a dupe of bug 1302071 linking
> to the relevant changeset.

Thanks!
I've marked it as a dupe.

I'll remove EnsureEventualDidPaintEvent and modify the related code in this bug.
This function will create the timer which fires the MozAfterPaint after attached the MozAfterPaint listener[1]

[1] https://searchfox.org/mozilla-central/rev/919dce54f43356c22d6ff6b81c07ef412b1bf933/layout/base/nsPresContext.cpp#3468-3472
Component: DOM: Animation → Layout
No longer depends on: 1341294
Summary: Introduce transaction id for paint listerner for using it in animation mochitest. → Remove the timer of MozAfterPaint which brings intermittent test fail of animation.
Comment on attachment 8930792 [details]
Bug 1419226 - Part 2. Wait for MozAfterPaint instead of waiting for frames on file_deferred_start.html .

https://reviewboard.mozilla.org/r/201446/#review207198

::: dom/animation/test/mozilla/file_deferred_start.html:100
(Diff revision 1)
> -  // for paints and only then do we commence the test. Even doing that, this
> -  // test can sometimes pass when it should not due to a stray paint. Most of
> -  // the time, however, it will correctly fail so hopefully even if we do
> +  // that, this test can sometimes pass when it should not due to a stray paint.
> +  // Most of the time, however, it will correctly fail so hopefully even if we
> +  // do occasionally produce a false negative on one platform, another platform
> -  // occasionally produce a false negative on one platform, another platform
>    // will fail as expected.
> -  return waitForDocLoad().then(() => waitForIdle())
> +  return waitForPaints().then(() => {

I thinks we should still wait for document load.  I'd suggest to wait for document load only in the first test case and drop other usage in other test cases.  What do you think?

::: dom/animation/test/mozilla/file_deferred_start.html:105
(Diff revision 1)
> -    // TODO: Current waitForPaint() will not wait for MozAfterPaint in this
> -    // case(Bug 1341294), so this waiting code is workaround for it.
> +      // TODO : This test might run incorrectly due to fire MozAfterPaint based
> +      //        on software timer(100ms). For detail, see bug 1395971.

I don't quite understand this comment.  Could you please elaborate it?  What will happen in bug 1395971?

::: dom/animation/test/mozilla/file_restyles.html:70
(Diff revision 1)
>        resolve(stylingMarkers);
>      });
>    });
>  }
>  
> -function ensureElementRemoval(aElement) {
> +function ensureElementRemoval(aElement, shouldWaitForPaint = true) {

Using boolean argument in multiple argumensts is generally a bad idea.  We shouldn't do it.

::: dom/animation/test/style/file_missing-keyframe-on-compositor.html:160
(Diff revision 1)
> +    SpecialPowers.DOMWindowUtils.advanceTimeAndRefresh(50 * MS_PER_SEC);
>      return waitForPaintsFlushed();
>    }).then(() => {
> -    SpecialPowers.DOMWindowUtils.advanceTimeAndRefresh(50 * MS_PER_SEC);
> -

This change is totally wrong.  Before this change, the test assumes;

1) an animation is sent to the compositor
2) call pause(), thus the animation is updated on the main thread
3) then the animation is re-sent to the compositor
4) advance 50ms on the compositor
5) check the value on the compositor

After this change the time is advanced when the animation is on the main thead, right?
Attachment #8930792 - Flags: review?(hikezoe) → review-
If the fallback timer is now only needed by print preview, rather than leaving it in nsPresContext, it seems better to move it to print preview code?
(In reply to Brian Birtles (:birtles) from comment #11)
> If the fallback timer is now only needed by print preview, rather than
> leaving it in nsPresContext, it seems better to move it to print preview
> code?

Thanks.
I try to remove the timer from nsPresContxt and add this timer to the print preview code. It seems to work correctly in my local environment. So I updated this patches.
Comment on attachment 8930792 [details]
Bug 1419226 - Part 2. Wait for MozAfterPaint instead of waiting for frames on file_deferred_start.html .

https://reviewboard.mozilla.org/r/201446/#review207198

Thank you for the review.

> I thinks we should still wait for document load.  I'd suggest to wait for document load only in the first test case and drop other usage in other test cases.  What do you think?

Yes, I think it will clarify the test. So I drop this change.

> I don't quite understand this comment.  Could you please elaborate it?  What will happen in bug 1395971?

Oh,, Sory. This comment is no longer necessary in this change.

> Using boolean argument in multiple argumensts is generally a bad idea.  We shouldn't do it.

OK. I use element.remove instead of using this flag.

> This change is totally wrong.  Before this change, the test assumes;
> 
> 1) an animation is sent to the compositor
> 2) call pause(), thus the animation is updated on the main thread
> 3) then the animation is re-sent to the compositor
> 4) advance 50ms on the compositor
> 5) check the value on the compositor
> 
> After this change the time is advanced when the animation is on the main thead, right?

Thanks. This change is not need for this bug. This was for previous transaction id.
This test will pass even without changing this test.
Comment on attachment 8930792 [details]
Bug 1419226 - Part 2. Wait for MozAfterPaint instead of waiting for frames on file_deferred_start.html .

https://reviewboard.mozilla.org/r/201446/#review207278

::: commit-message-a3c8b:5
(Diff revision 3)
> +Bug 1419226 - Part 3.Wait for non-timer MozAfterPaint on an animation mochitest.r?hiro
> +
> +This patch will:
> + * Wait for MozAfterPaint without waiting for another event in test_deferred_start.html.
> + * Remove waiting for MozAfterPaint if paint process doesn't occur in test_restyles.html.

Could you please check when MozAfterPaint is not fired exactly?

Initially I was thinking that removing an element is scrolled-out does not fire MozAfterPaint.  I think this is correct.  But those scrolled-out element is inside another element which has basically scrollbar, so removing the parent should fire MozAfterPaint, right?  Also I am wondering whether removing visibility:hidden element does fire MozAfterPaint or not, since visibility:hidden affects layout,  I guess it should fire (if reflow happens?).
Attachment #8930792 - Flags: review?(hikezoe) → review-
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #17)
> Comment on attachment 8930792 [details]
> Bug 1419226 - Part 3.Wait for non-timer MozAfterPaint on an animation
> mochitest.
> 
> https://reviewboard.mozilla.org/r/201446/#review207198
> 
> Thank you for the review.
> 
> > I thinks we should still wait for document load.  I'd suggest to wait for document load only in the first test case and drop other usage in other test cases.  What do you think?
> 
> Yes, I think it will clarify the test. So I drop this change.

I think we should change them in this bug.  What I was concerned is that DidComposite happens when nobody listens MozAfterPaint event during waiting for document load.


  const div = addDiv(t, { class: 'target' });

  return waitForDocLoad().then(() => waitForPaints())

  .then(() => {

With this setup, a paint process for 'addDiv' might happens while waiting for document load, right?  So this setup should be

  return waitForDocLoad().then(() => {
    const div = addDiv(t, { class: 'target' });
    return waitForPaints();
  })
Comment on attachment 8930790 [details]
Bug 1419226 - Part 1.Change observing target window of MozAfterPaint.

https://reviewboard.mozilla.org/r/201442/#review207554

Thanks! This probably took care of some bugs because we were attaching the event handler on the wrong window. :/

::: toolkit/content/browser-content.js:594
(Diff revision 2)
>        // will wait for MozAfterPaint event to be fired.
>        let webProgressListener = {
>          onStateChange(webProgress, req, flags, status) {
>            if (flags & Ci.nsIWebProgressListener.STATE_STOP) {
>              webProgress.removeProgressListener(webProgressListener);
> -            let domUtils = content.QueryInterface(Ci.nsIInterfaceRequestor)
> +            let domUtils = contentWindow.QueryInterface(Ci.nsIInterfaceRequestor)

Oof, good catch!
Attachment #8930790 - Flags: review?(mconley) → review+
Comment on attachment 8930791 [details]
Bug 1419226 - Part 2. Remove MozAfterPaint timer and firing MozAfterPaint when revoke transaction.

https://reviewboard.mozilla.org/r/201444/#review207616
Attachment #8930791 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8930792 [details]
Bug 1419226 - Part 2. Wait for MozAfterPaint instead of waiting for frames on file_deferred_start.html .

https://reviewboard.mozilla.org/r/201446/#review207278

Thanks hiro,

I addressed tests code: 
 * drop file_restyles change.
 * change the timing of creating div element.

> Could you please check when MozAfterPaint is not fired exactly?
> 
> Initially I was thinking that removing an element is scrolled-out does not fire MozAfterPaint.  I think this is correct.  But those scrolled-out element is inside another element which has basically scrollbar, so removing the parent should fire MozAfterPaint, right?  Also I am wondering whether removing visibility:hidden element does fire MozAfterPaint or not, since visibility:hidden affects layout,  I guess it should fire (if reflow happens?).

Sorry, this test is same to test_missing-keyframe-on-compoistor.html. i.e. we don't need to modify on this bug.
Comment on attachment 8930792 [details]
Bug 1419226 - Part 2. Wait for MozAfterPaint instead of waiting for frames on file_deferred_start.html .

https://reviewboard.mozilla.org/r/201446/#review208038

As I suggested in comment 10, we just need to wait for document load in the *first* test case.  Rest of test cases shouldn't wait for the document load since they are promise test, right?

r=me with that change (comments should also be modified).
Attachment #8930792 - Flags: review?(hikezoe) → review+
Blocks: 1415783
Blocks: 1395971
No longer depends on: 1395971
Comment on attachment 8930792 [details]
Bug 1419226 - Part 2. Wait for MozAfterPaint instead of waiting for frames on file_deferred_start.html .

https://reviewboard.mozilla.org/r/201446/#review208038

Thank you.
I changed to waiting for document load, then create div element after it in first test case only.
Pushed by mantaroh@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/21d9bedcf411
Part 1.Change observing target window of MozAfterPaint. r=mconley
https://hg.mozilla.org/integration/autoland/rev/8cda3fb3ce1a
Part 2. Remove notify did paint timer. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/5a2460c34657
Part 3.Wait for MozAfterPaint instead of waiting for frames on file_deferred_start.html . r=hiro
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #31)
> Try:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=1491909d4d02e3005643013bc421367f5e1e45a4&group_state=e
> xpanded

I suspect test_bug574663.html is affected by this bug since it uses MozAfterPaint and the intermittent failure (bug 1324160) has never happened for a year. 

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1491909d4d02e3005643013bc421367f5e1e45a4&group_state=expanded&selectedJob=147746596
Flags: needinfo?(mantaroh)
(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on intermittent or backout) from comment #36)
> And these intermittent reftest failures:
>
> https://treeherder.mozilla.org/logviewer.html#?job_id=147773530&repo=autoland
> https://treeherder.mozilla.org/logviewer.html#?job_id=147790611&repo=autoland

Thanks.

This reason is that 'apz-repaints-flushed' events didn't happen.[1] I guess wpt failure is same of this. In any way, I'll look into this more.

[1] https://searchfox.org/mozilla-central/rev/7a8c667bdd2a4a32746c9862356e199627c0896d/layout/tools/reftest/reftest-content.js#680
Flags: needinfo?(mantaroh)
Priority: -- → P3
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #38)
> (In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on
> intermittent or backout) from comment #36)
> > And these intermittent reftest failures:
> >
> > https://treeherder.mozilla.org/logviewer.html#?job_id=147773530&repo=autoland
> > https://treeherder.mozilla.org/logviewer.html#?job_id=147790611&repo=autoland
> 
> Thanks.
> 
> This reason is that 'apz-repaints-flushed' events didn't happen.[1] I guess
> wpt failure is same of this. In any way, I'll look into this more.
> 
> [1]
> https://searchfox.org/mozilla-central/rev/
> 7a8c667bdd2a4a32746c9862356e199627c0896d/layout/tools/reftest/reftest-
> content.js#680

Oh,, Sorry. This is wrong. Timeout is occurred in next step, i.e. step of waiting to finish. Local test failure log is follow:

REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_FOR_APZ_FLUSH
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in file:///home/mantaroh/mozilla-central/layout/reftests/border-image/470250-2.html
REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 0 rects
REFTEST INFO | Saved log: Updating canvas for invalidation
REFTEST INFO | Saved log: [CONTENT] MakeProgress: apz-repaints-flushed fired
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FINISH
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FINISH
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in file:///home/mantaroh/mozilla-central/layout/reftests/border-image/470250-2.html
REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 1 rects
REFTEST INFO | Saved log: [CONTENT] Rect: 8 8 60 112
REFTEST INFO | Saved log: Updating canvas for invalidation
REFTEST INFO | Saved log: DoDrawWindow 8,8,52,104
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FINISH
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
REFTEST TEST-END | file:///home/mantaroh/mozilla-central/layout/reftests/border-image/470250-2.html == file:///home/mantaroh/mozilla-central/layout/reftests/border-image/470250-2-ref.html
++DOMWINDOW == 10 (0x7fbcc8073c00) [pid = 6575] [serial = 872] [outer = 0x7fbcd9a25470]

reftest.jsm will wait for mozAfterPaint after updating canvas, however it didn't receive MozAfterPaint after updating canvas in this case.

[1] https://searchfox.org/mozilla-central/rev/7a8c667bdd2a4a32746c9862356e199627c0896d/layout/tools/reftest/reftest-content.js#704
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #39)
> (In reply to Mantaroh Yoshinaga[:mantaroh] from comment #38)
> > (In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on
> > intermittent or backout) from comment #36)
> > > And these intermittent reftest failures:
> > >
> > > https://treeherder.mozilla.org/logviewer.html#?job_id=147773530&repo=autoland
> > > https://treeherder.mozilla.org/logviewer.html#?job_id=147790611&repo=autoland
> > 
> > Thanks.
> > 
> > This reason is that 'apz-repaints-flushed' events didn't happen.[1] I guess
> > wpt failure is same of this. In any way, I'll look into this more.
> > 
> > [1]
> > https://searchfox.org/mozilla-central/rev/
> > 7a8c667bdd2a4a32746c9862356e199627c0896d/layout/tools/reftest/reftest-
> > content.js#680
> 
> Oh,, Sorry. This is wrong. Timeout is occurred in next step, i.e. step of
> waiting to finish. Local test failure log is follow:
> 
> REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_FOR_APZ_FLUSH
> REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in
> file:///home/mantaroh/mozilla-central/layout/reftests/border-image/470250-2.
> html
> REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 0 rects
> REFTEST INFO | Saved log: Updating canvas for invalidation
> REFTEST INFO | Saved log: [CONTENT] MakeProgress: apz-repaints-flushed fired
> REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FINISH
> REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
> REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FINISH
> REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
> REFTEST INFO | Saved log: [CONTENT] AfterPaintListener in
> file:///home/mantaroh/mozilla-central/layout/reftests/border-image/470250-2.
> html
> REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 1 rects
> REFTEST INFO | Saved log: [CONTENT] Rect: 8 8 60 112
> REFTEST INFO | Saved log: Updating canvas for invalidation
> REFTEST INFO | Saved log: DoDrawWindow 8,8,52,104
> REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FINISH
> REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
> REFTEST TEST-END |
> file:///home/mantaroh/mozilla-central/layout/reftests/border-image/470250-2.
> html ==
> file:///home/mantaroh/mozilla-central/layout/reftests/border-image/470250-2-
> ref.html
> ++DOMWINDOW == 10 (0x7fbcc8073c00) [pid = 6575] [serial = 872] [outer =
> 0x7fbcd9a25470]
> 
> reftest.jsm will wait for mozAfterPaint after updating canvas, however it
> didn't receive MozAfterPaint after updating canvas in this case.
> 
> [1]
> https://searchfox.org/mozilla-central/rev/
> 7a8c667bdd2a4a32746c9862356e199627c0896d/layout/tools/reftest/reftest-
> content.js#704

I looked into this phenomenon, I guess that we should ensure this event even if display list invalidation doesn't invalidate. (i.e. isMozAfterPaintPending return true since view manager of refresh driver has pending flush, however, this event didn't happen[2])

[1] https://searchfox.org/mozilla-central/rev/7a8c667bdd2a4a32746c9862356e199627c0896d/layout/tools/reftest/reftest-content.js#431
[2] https://searchfox.org/mozilla-central/rev/7a8c667bdd2a4a32746c9862356e199627c0896d/layout/base/nsPresContext.cpp#185

I think that we should add software timer like a print preview into reftest-content.js.

This patch will add software timer which ensures the MozAfterPaint:
https://hg.mozilla.org/try/rev/e5b1466c0da5e4dc4a58e70ba278ba802c4ae7c1

In my local environment, this failure doesn't happen. (try num is 200000). I'll confirm other test cases as well.
No longer blocks: 1415783
I agree that adding a fallback timer in JS in the reftest harness is preferable to adding it in C++ (where it will affect all tests, including those that don't need/expect it).

One area we might need to check is whether the timer added to the reftest harness will affect crashtests too. I assume it will and it probably is better if it does (since existing crashtests might be relying on that) but we should check.
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #40)
> I looked into this phenomenon, I guess that we should ensure this event even
> if display list invalidation doesn't invalidate. (i.e.
> isMozAfterPaintPending return true since view manager of refresh driver has
> pending flush, however, this event didn't happen[2])

I am wondering the mismatch between isMozAfterPaintPending flag and MozAfterPaint event is not a problem?  It seems to me that the mismatch might affect mochitest and other tests as well, no?
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #40)
> I looked into this phenomenon, I guess that we should ensure this event even
> if display list invalidation doesn't invalidate. (i.e.
> isMozAfterPaintPending return true since view manager of refresh driver has
> pending flush, however, this event didn't happen[2])

Oh wait.  I think Mantaroh was misunderstanding here (or, just wrote a different comment what he had in mind).

As per the log in comment 39, MozAfterPaint event was surely fired,  the reftest got stuck while waiting for clearing isMozAfterPaintPending.  Anyway we have to know how this mismatch happens, the mismatch is not intuitive to me.
(In reply to Hiroyuki Ikezoe (:hiro) from comment #43)
> (In reply to Mantaroh Yoshinaga[:mantaroh] from comment #40)
> > I looked into this phenomenon, I guess that we should ensure this event even
> > if display list invalidation doesn't invalidate. (i.e.
> > isMozAfterPaintPending return true since view manager of refresh driver has
> > pending flush, however, this event didn't happen[2])
> 
> Oh wait.  I think Mantaroh was misunderstanding here (or, just wrote a
> different comment what he had in mind).
> 
> As per the log in comment 39, MozAfterPaint event was surely fired,  the
> reftest got stuck while waiting for clearing isMozAfterPaintPending.  Anyway
> we have to know how this mismatch happens, the mismatch is not intuitive to
> me.

As commented in comment 39, I guess there are cases that doesn't occur the MozAfterPaint even if isMozAfterPaintPending returned true.[1] For example case, nsIWidgetListener of view manager revoked the pending flush, then view manager didn't paint since damaged region is empty.
In this case, I think that isMozAfterPaintPending will return true, then MozAfterPaintPending will not happen.

[1] https://searchfox.org/mozilla-central/rev/9f3bd430c2b132c86c46126a0548661de876799a/view/nsViewManager.cpp#1097

Please correct me, if I am mistaken.
Flags: needinfo?(hikezoe)
(In reply to Brian Birtles (:birtles) from comment #41)
> I agree that adding a fallback timer in JS in the reftest harness is
> preferable to adding it in C++ (where it will affect all tests, including
> those that don't need/expect it).
> 
> One area we might need to check is whether the timer added to the reftest
> harness will affect crashtests too. I assume it will and it probably is
> better if it does (since existing crashtests might be relying on that) but
> we should check.

Thanks!
I'll check crash tests as well, and I think that I need to change the Marionette harness(lisetener.js) due to same reason.
(https://searchfox.org/mozilla-central/source/testing/marionette/listener.js#1811)
Clearing ni? to me since I did ask Mantaroh on IRC to elaborate what's going on there in the failure case.

What I can tell from the log in comment 39 is reftest tools does not wait for MozAfterPaint event after taking the snapshot (although the log is 'waiting for MozAfterPaint').  It just waits for clearing isMozAfterPaintPending flag.  I've verified this fact locally to run the failure reftest (470250-2.html).  So this is the question what I've been talking.  Why isMozAfterPaintPending flag remains true?  Doesn't it make other tests failure?  Also why does 470250-2.html fail intermittently?
Flags: needinfo?(hikezoe)
Here's timer:
https://dxr.mozilla.org/mozilla-central/rev/c2248f85346939d3e0b01f57276c440ccb2d16a1/layout/base/PresShell.cpp#3738

I guess that a test waits for the MozAfterPaint forever, if this timer is fired between the Tick() and IsMozAfterPaintPending(). 
I.e. this timer set mViewManagerFlushIsPending to true before next Tick(), then test script ask isMozAfterPaintPending().
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #47)
> Here's timer:
> https://dxr.mozilla.org/mozilla-central/rev/
> c2248f85346939d3e0b01f57276c440ccb2d16a1/layout/base/PresShell.cpp#3738
> 
> I guess that a test waits for the MozAfterPaint forever,

I wanted to know the fact, not a guess.  Anyway, does 'the MozAfterPaint' mean MozAfterPaint *event*?  I think it's not.
You mean isMozAterPaintPending *flag*, more precisely reftest tools waits for being isMozAfterPaintPending flag cleared [1], right?

[1] https://hg.mozilla.org/mozilla-central/file/38f49346a200/layout/tools/reftest/reftest-content.js#l428
(In reply to Hiroyuki Ikezoe (:hiro) from comment #48)
> (In reply to Mantaroh Yoshinaga[:mantaroh] from comment #47)
> > Here's timer:
> > https://dxr.mozilla.org/mozilla-central/rev/
> > c2248f85346939d3e0b01f57276c440ccb2d16a1/layout/base/PresShell.cpp#3738
> > 
> > I guess that a test waits for the MozAfterPaint forever,
> 
> I wanted to know the fact, not a guess.  Anyway, does 'the MozAfterPaint'
> mean MozAfterPaint *event*?  I think it's not.
> You mean isMozAterPaintPending *flag*, more precisely reftest tools waits
> for being isMozAfterPaintPending flag cleared [1], right?
> 

isMozAfterPaintPending is not a flag like bool member.
It checks every time :
  * PresContext has unfired MozAfterPaint event(i.e. pending this event) or not.
  * RootPresContext's view manager has the pending flush or not.

In this test failure, the second condition will be true when received MozAfterPaint event, then reftest's status will be 'STATE_WAITING_TO_FINISH' which is waiting for MozAfterPaint event again[1]. Your mentioned check[2] will not start unless receiving the MozAfterPaint event.

If the timer of comment 47 will fire, the second condition of isMozAfterPaintPending will be true due to calling the ScheduleViewManagerFlush()[3]. Then reftest call the isMozAfterPaintPending after this timer finished and status of reftest will be the waiting for the MozAfterPaint event again. 

[1] https://searchfox.org/mozilla-central/rev/be78e6ea9b10b1f5b2b3b013f01d86e1062abb2b/layout/tools/reftest/reftest-content.js#704
[2] https://searchfox.org/mozilla-central/rev/be78e6ea9b10b1f5b2b3b013f01d86e1062abb2b/layout/tools/reftest/reftest-content.js#431
[3] https://searchfox.org/mozilla-central/rev/be78e6ea9b10b1f5b2b3b013f01d86e1062abb2b/layout/base/PresShell.cpp#3751
I am really confused.  If you are right, what happened the log case in comment 39?

REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 1 rects
REFTEST INFO | Saved log: [CONTENT] Rect: 8 8 60 112
REFTEST INFO | Saved log: Updating canvas for invalidation
REFTEST INFO | Saved log: DoDrawWindow 8,8,52,104
REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FINISH
REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint

The last line shows us the fact that reftest tools reached inside 'if (shouldWaitForPendingPaints())' block [1].  What does it mean?  If your explanation is correct, the log is not a failure case?

[1] https://hg.mozilla.org/mozilla-central/file/38f49346a200/layout/tools/reftest/reftest-content.js#l603
(In reply to Hiroyuki Ikezoe (:hiro) from comment #50)
> I am really confused.  If you are right, what happened the log case in
> comment 39?
> 
> REFTEST INFO | Saved log: [CONTENT] SendUpdateCanvasForEvent with 1 rects
> REFTEST INFO | Saved log: [CONTENT] Rect: 8 8 60 112
> REFTEST INFO | Saved log: Updating canvas for invalidation
> REFTEST INFO | Saved log: DoDrawWindow 8,8,52,104
> REFTEST INFO | Saved log: [CONTENT] MakeProgress: STATE_WAITING_TO_FINISH
> REFTEST INFO | Saved log: [CONTENT] MakeProgress: waiting for MozAfterPaint
> 
> The last line shows us the fact that reftest tools reached inside 'if
> (shouldWaitForPendingPaints())' block [1].  What does it mean?  If your
> explanation is correct, the log is not a failure case?
> 
> [1]
> https://hg.mozilla.org/mozilla-central/file/38f49346a200/layout/tools/
> reftest/reftest-content.js#l603

As explained to you on F2F, this timeout intermittent mechanism is as follow:

1) 'reftest-content.js' received the 'MozAfterPaint' event when its status is the 'STATE_WAITING_TO_FINISH', then its 'MozAfterEvent' event handler set timeout for running 'MakeProgress' again.[1]

2) A timer of delayed compress paint happen here[2], then set 'mViewManagerFlushIsPending' of refresh driver to true. This flag is used by 'isMozAfterPaintPending()'.

3) A timer of step 1) happen here, then check whether a 'isMozAfterPaintPending' is true or not.[3]
   In this step, 'isMozAfterPaintPending' returns true due to step 2).
   Then 'reftest-content.js' went into the 'STATE_WAITING_TO_FINISH' state.

4) MozAfterPaint event doesn't happen never after this step, then test will timeout.

[1] https://dxr.mozilla.org/mozilla-central/rev/ba283baf4e98aa3a5f45a17981077b98697aa73a/layout/tools/reftest/reftest-content.js#549
[2] https://dxr.mozilla.org/mozilla-central/rev/ba283baf4e98aa3a5f45a17981077b98697aa73a/layout/base/PresShell.cpp#3751
[3] https://dxr.mozilla.org/mozilla-central/rev/ba283baf4e98aa3a5f45a17981077b98697aa73a/layout/base/nsPresContext.cpp#185

I think that we might not need to wait 'MozAfterPaint' event in this step(i.e. STATE_WAITING_TO_FINISH) since target reftests removed 'reftest-wait' attribute in this step. Or.. We need to drop a timer of above step 2), but I'm not sure that to remove this timer affect other code now.
Thanks for the explanation!

(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #51)
> 3) A timer of step 1) happen here, then check whether a
> 'isMozAfterPaintPending' is true or not.[3]
>    In this step, 'isMozAfterPaintPending' returns true due to step 2).
>    Then 'reftest-content.js' went into the 'STATE_WAITING_TO_FINISH' state.
> 
> 4) MozAfterPaint event doesn't happen never after this step,

To be clear, we believe this is the right behavior, right?  I mean there is no need to fire MozAfterPaint at all, more precisely there is no trigger to paint something.

> I think that we might not need to wait 'MozAfterPaint' event in this
> step(i.e. STATE_WAITING_TO_FINISH) since target reftests removed
> 'reftest-wait' attribute in this step. Or.. We need to drop a timer of above
> step 2), but I'm not sure that to remove this timer affect other code now.

So, after some more thought, we should stop the timer to avoid this kind of failures somehow, otherwise we can easily fall into the same pit fall in various test cases (not only reftest, I think).
(In reply to Hiroyuki Ikezoe (:hiro) from comment #52)
> Thanks for the explanation!
> 
> (In reply to Mantaroh Yoshinaga[:mantaroh] from comment #51)
> > 3) A timer of step 1) happen here, then check whether a
> > 'isMozAfterPaintPending' is true or not.[3]
> >    In this step, 'isMozAfterPaintPending' returns true due to step 2).
> >    Then 'reftest-content.js' went into the 'STATE_WAITING_TO_FINISH' state.
> > 
> > 4) MozAfterPaint event doesn't happen never after this step,
> 
> To be clear, we believe this is the right behavior, right?  I mean there is
> no need to fire MozAfterPaint at all, more precisely there is no trigger to
> paint something.
> 

Yes. This test behavior is correct since isMozAfterPaint returns true.

> > I think that we might not need to wait 'MozAfterPaint' event in this
> > step(i.e. STATE_WAITING_TO_FINISH) since target reftests removed
> > 'reftest-wait' attribute in this step. Or.. We need to drop a timer of above
> > step 2), but I'm not sure that to remove this timer affect other code now.
> 
> So, after some more thought, we should stop the timer to avoid this kind of
> failures somehow, otherwise we can easily fall into the same pit fall in
> various test cases (not only reftest, I think).

Yes, we should test will fail if test uses 'MozAfterPaint' directly or other test suit use it.
Actually, WPT will happen this problem. So I was about to change the marionette, but I think we had better to remove this timer as well. (I'm wondering why we choice 1000 ms..)
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #53)

> Yes, we should test will fail if test uses 'MozAfterPaint' directly or other
> test suit use it.
> Actually, WPT will happen this problem. So I was about to change the
> marionette, but I think we had better to remove this timer as well. (I'm
> wondering why we choice 1000 ms..)

Note that I don't suggest removing the timer entirely, I'd suggest to stop/cancel the timer somehow.
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #53)
> Yes, we should test will fail if test uses 'MozAfterPaint' directly or other
> test suit use it.
> Actually, WPT will happen this problem. So I was about to change the
> marionette, but I think we had better to remove this timer as well. (I'm
> wondering why we choice 1000 ms..)

mattwoodrow,
We create the timer which setting the 'paint compress flag' and call 'PressShell::ScheduleViewManagerFlush()' again after 1000 ms[1]. If this timer occurs, we call the 'nsRefreshDriver::ScheduleViewManagerFlush()', as result of it, test suit will mistake that test should wait for 'MozAfterPaint' event since 'isMozAfterPaint' returns true. This flag will be reset by nsRefreshDriver::Tick().

Now I have a question about compress mode of paint. Should we call the 'nsRefreshDriver::ScheduleViewManagerFlush()' again in this timer? Or Can we know when a timer is no longer need to?

[1] https://hg.mozilla.org/mozilla-central/annotate/ba283baf4e98aa3a5f45a17981077b98697aa73a/layout/base/PresShell.cpp#l3734 (This timer introduced by bug 946952)
Flags: needinfo?(matt.woodrow)
Why don't we get a MozAfterPaint event at the end?

The idea is that if we return true from isMozAfterPaintPending, then we're guaranteeing that we'll fire MozAfterPaint soon.

The implementation of nsPresContext::IsDOMPaintEventPending() records an empty rect invalidation when it returns true, so that we'll have something to send to MozAfterPaint even if we don't actually invalidate anything on the next paint.

So we're returning true from isMozAfterPaintPending (and storing an empty rect invalidation on the pres context), and the reftest harness decides to wait for MozAfterPaint. Then what happens? Do we get the scheduled view manager flush? Why isn't this delivering MozAfterPaint?

Is it because the paint ended up making no changes, we never send anything to the compositor, and then we never get DidComposite back? In that case, I think we need to do something to flush pending MozAfterPaints from the pres context immediately.


I also think we can get rid of the PAINT_DELAYED_COMPRESS code entirely (it was added to workaround a b2g issue), but I think we need to fix MozAfterPaint delivery properly, since there's probably other ways to hit this bug.
Flags: needinfo?(matt.woodrow)
Thanks, mattwoodrow.

(In reply to Matt Woodrow (:mattwoodrow) from comment #56)
> Why don't we get a MozAfterPaint event at the end?
> 
> The idea is that if we return true from isMozAfterPaintPending, then we're
> guaranteeing that we'll fire MozAfterPaint soon.
> 
> The implementation of nsPresContext::IsDOMPaintEventPending() records an
> empty rect invalidation when it returns true, so that we'll have something
> to send to MozAfterPaint even if we don't actually invalidate anything on
> the next paint.
> 
> So we're returning true from isMozAfterPaintPending (and storing an empty
> rect invalidation on the pres context), and the reftest harness decides to
> wait for MozAfterPaint. Then what happens? Do we get the scheduled view
> manager flush? Why isn't this delivering MozAfterPaint?
> 

Yes, a current implementation will send a MozAfterPaint event with empty rect if the paint ended up making no changes. Several animation tests are using this MozAfterPaint event as a trigger for checking the compositor's animation properties. If gecko consumes a time for sending animation to the compositor, the gecko will fire the MozAfterPaint event due to software timer[1]. That is 'MozAfterPaint' might contain an after painting event which is not finishing paint process. I guess this case affect several tests, like talos/reftest/wpt. Especially, this problem affects to animation tests since animation tests check detail condition after painting content.

So I decided removing this timer(attachment 8930791 [details]), however, it means that test suite(reftest/marionette) should care the case which does not receiving MozAfterPaint. (As mentioned you, it's the case of the paint ended up making no changes) As far as I know, we need to fix the marionette for caring this case.[2]

[1] https://searchfox.org/mozilla-central/rev/7fb999d1d39418fd331284fab909df076b967ac6/layout/base/nsPresContext.cpp#3461-3467
[2] https://searchfox.org/mozilla-central/rev/7fb999d1d39418fd331284fab909df076b967ac6/testing/marionette/listener.js#1683-1692

try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6144594c0cbe34fc06317bf2d9db368cc9c3f3cb

mattwoodrow:
I think that user who uses this MozAfterPaint event need to care the case which doesn't happen MozAfterPaint event because current MozAfterPaint contains a fake event which isn't finishing the paint process. So I'd like to remove this fake event and change test suite. What do you think this modification?
Flags: needinfo?(matt.woodrow)
If I understand the Marionette patch correctly it resolves the
promise after 1s, irregardless of whether MozAfterPaint fired.
If MozAfterPaint fires first, the timer isn’t cleared and calls
resolve() after 1s anyway.  Maybe this doesn’t matter because
maybeResolve will have been called first.

(Also note that any change to testing/marionette must also be
reviewed by a Marionette peer.)
WebRender also does not fire MozAfterPaint on some conditions.  It's worth checking.  I guess we can solve this and that bug altogether.
See Also: → 1425315
(In reply to Hiroyuki Ikezoe (:hiro) from comment #59)
> WebRender also does not fire MozAfterPaint on some conditions.  It's worth
> checking.  I guess we can solve this and that bug altogether.

The WebRender problem is that we fire too many MozAfterPaint since we don't know when to stop. So it's probably a different problem.
(In reply to Ethan Lin[:ethlin] from comment #60)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #59)
> > WebRender also does not fire MozAfterPaint on some conditions.  It's worth
> > checking.  I guess we can solve this and that bug altogether.
> 
> The WebRender problem is that we fire too many MozAfterPaint since we don't
> know when to stop. So it's probably a different problem.

Yeah, we've also been seeing unexpected MozAfterPaint events which should not be observed in bug 1395971.  That's the one triggered by the timer.
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #57)
> I think that user who uses this MozAfterPaint event need to care the case
> which doesn't happen MozAfterPaint event because current MozAfterPaint
> contains a fake event which isn't finishing the paint process. So I'd like
> to remove this fake event and change test suite. What do you think this
> modification?

The current behaviour makes it very easy to write correct JS code, if isMozAfterPaintPending returns true, then you wait for a MozAfterPaint event.

The proposed change means that all callers need to handle the case where the event never gets delivered, which seems annoying and bug prone. We also need to go through all the existing callers (including addons!) and make sure that they are updated correctly to handle this new behaviour.

I'd much prefer if we kept the current behaviour, fixed the bugs where we don't deliver MozAfterPaint normally, and then remove the software time (since it's basically a hack to workaround the bugs).

[1] is at least one place that I'm pretty sure we have this bug. That code runs when we tried to forward changes to the compositor, but it turned out nothing had changed. We return the transaction id to the refresh driver, but nothing will ever tell the pres context to fire the MozAfterPaint event.

The correct fix for this is a bit complicated, since if we're currently waiting for the previous transaction id to be returned from the compositor, we don't want to send the event for those early (or out of order).

I think the correct behaviour is:

* If there are no invalidations on the pres context for the skipped transaction, then do nothing since we must not have promised to anyone that we would fire MozAfterPaint.
* If there are invalidations for the skipped transaction, and also invalidations for an earlier transaction id, then we need to combine them so that the MozAfterPaint events for both gets sent when the earlier one finishes.
* If there are invalidations for the skipped transaction only, then fire MozAfterPaint immediately.



[1] https://searchfox.org/mozilla-central/source/gfx/layers/client/ClientLayerManager.cpp#772
Flags: needinfo?(matt.woodrow)
Thanks, mattwoodrow.

(In reply to Matt Woodrow (:mattwoodrow) from comment #62)
> (In reply to Mantaroh Yoshinaga[:mantaroh] from comment #57)
> > I think that user who uses this MozAfterPaint event need to care the case
> > which doesn't happen MozAfterPaint event because current MozAfterPaint
> > contains a fake event which isn't finishing the paint process. So I'd like
> > to remove this fake event and change test suite. What do you think this
> > modification?
> 
> The current behaviour makes it very easy to write correct JS code, if
> isMozAfterPaintPending returns true, then you wait for a MozAfterPaint event.
> 
> The proposed change means that all callers need to handle the case where the
> event never gets delivered, which seems annoying and bug prone. We also need
> to go through all the existing callers (including addons!) and make sure
> that they are updated correctly to handle this new behaviour.
> 

I didn't think about add-ons.
Yes, as you mentioned, this proposed fix will affect many tests and add-ons.

> I'd much prefer if we kept the current behaviour, fixed the bugs where we
> don't deliver MozAfterPaint normally, and then remove the software time
> (since it's basically a hack to workaround the bugs).
> 

OK. I'll try it. I don't know why added this timer(100ms), but as far as I can see, several tests of wpt and mochitest will not receive the MozAfterPaint if remove this timer, so we should deliver a MozAfterPaint event in this case. 
(https://treeherder.mozilla.org/#/jobs?repo=try&revision=da9b4cffb73347d6701aa05bf737c80cb10f46af)

> [1] is at least one place that I'm pretty sure we have this bug. That code
> runs when we tried to forward changes to the compositor, but it turned out
> nothing had changed. We return the transaction id to the refresh driver, but
> nothing will ever tell the pres context to fire the MozAfterPaint event.
> 

Thank you for your advice. In my local environment, If added firing MozAfterPaint event in this revoking transaction code, those test will pass without a timeout:
https://hg.mozilla.org/try/rev/6383d2bd34197df023331ed147684a504410ab6a#l3.13
(https://treeherder.mozilla.org/#/jobs?repo=try&revision=1fd06d317ac77ead4941a98b4b63ce4c6de6488d&group_state=expanded)

But, in this result, transition and full screen test are failed on QuantumRender. I'll look into this reason.
Note to self:

These two webrender tests are failed:

* test_fullscreen-api.html : I guess this is same to bug 1398112. My local running log pointed out that test will wait for the resize event. I can reproduce this failure easy on the only xvfb + webrender enable environment. (i.e. I couldn't reproduce this bug in headless mode or actual window)

* test_transitions_replacement_on_busy_frame.html : It seems like this test will wait for finishing the first transition. According to try log[1]:
  1) This test waits for finishing the first transition at waitForPaint().[2]
  2) Set the second transition and get current property from second animation effect[3]
  3) Wait 300ms and wait for MozAfterPaint, then get current property from second animation effect[4]

  I think that step 1) behavior doesn't correct since this test expected that continue the following test immediately after resolving firstTransition.ready() and waitForPaint(). This firstTransition.ready() resolved immediately, but waitForPaint() will not resolve.(Wait after 100s, i.e. finishing the first transition.) This means that gecko will send the MozAfterPaint event only if a transition is finished on webrender.

[1] https://treeherder.mozilla.org/logviewer.html#?job_id=156833620&repo=try&lineNumber=8545
[2] https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/layout/style/test/file_transitions_replacement_on_busy_frame.html#52
[3] https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/layout/style/test/file_transitions_replacement_on_busy_frame.html#58-66
[4] https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/layout/style/test/file_transitions_replacement_on_busy_frame.html#82-91
[5] https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/dom/animation/test/mochitest.ini#107,118,123,129,133
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #64)
> Note to self:
> 
> These two webrender tests are failed:
> 
> * test_fullscreen-api.html : I guess this is same to bug 1398112. My local
> running log pointed out that test will wait for the resize event. I can
> reproduce this failure easy on the only xvfb + webrender enable environment.
> (i.e. I couldn't reproduce this bug in headless mode or actual window)
> 
> * test_transitions_replacement_on_busy_frame.html : It seems like this test
> will wait for finishing the first transition. According to try log[1]:
>   1) This test waits for finishing the first transition at waitForPaint().[2]
>   2) Set the second transition and get current property from second
> animation effect[3]
>   3) Wait 300ms and wait for MozAfterPaint, then get current property from
> second animation effect[4]
> 
>   I think that step 1) behavior doesn't correct since this test expected
> that continue the following test immediately after resolving
> firstTransition.ready() and waitForPaint(). This firstTransition.ready()
> resolved immediately, but waitForPaint() will not resolve.(Wait after 100s,
> i.e. finishing the first transition.) This means that gecko will send the
> MozAfterPaint event only if a transition is finished on webrender.
> 
> [1]
> https://treeherder.mozilla.org/logviewer.
> html#?job_id=156833620&repo=try&lineNumber=8545
> [2]
> https://searchfox.org/mozilla-central/rev/
> 1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/layout/style/test/
> file_transitions_replacement_on_busy_frame.html#52
> [3]
> https://searchfox.org/mozilla-central/rev/
> 1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/layout/style/test/
> file_transitions_replacement_on_busy_frame.html#58-66
> [4]
> https://searchfox.org/mozilla-central/rev/
> 1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/layout/style/test/
> file_transitions_replacement_on_busy_frame.html#82-91
> [5]
> https://searchfox.org/mozilla-central/rev/
> 1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/dom/animation/test/mochitest.
> ini#107,118,123,129,133

kats:

This bug will fix that gecko fire the MozAfterPaint correctly, then I will remove the timer which is firing the MozAfterPaint after 100ms. As result of it, it seems like that some situation of webrender will not send MozAfterPaint, then two mochitest of webrender is failed due to MozAfterPaint event.(test_fullscreen-api.html / test_transitions_replacement_on_busy_frame.html)

For workaround, I want to disable these tests of webrender as dup of bug 1424752, how do you think about this?
Flags: needinfo?(bugmail)
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #65)
> For workaround, I want to disable these tests of webrender as dup of bug
> 1424752, how do you think about this?

Just to make sure I understand: you're fixing some stuff in gecko, but it's causing a couple of mochitests to fail only when webrender is enabled? If so I agree that you should just disable those tests with "skip-if = webrender" since we know we have pre-existing issues surrounding MozAfterPaint with webrender. And yeah, you can mark it as "will be fixed by bug 1424752" or something if you need. I have a spreadsheet that is tracking the webrender failures in more detail and I will update that too. Thanks!
Flags: needinfo?(bugmail)
From the try in comment 64;

@@ -2185,16 +2185,22 @@ nsRefreshDriver::RevokeTransactionId(uin
 {
   MOZ_ASSERT(aTransactionId == mPendingTransaction);
   if (mPendingTransaction == mCompletedTransaction + 2 &&
       mWaitingForTransaction) {
     MOZ_ASSERT(!mSkippedPaints, "How did we skip a paint when we're in the middle of one?");
     FinishedWaitingForTransaction();
   }
   mPendingTransaction--;
+
+  // Fire did paint event.
+  nsPresContext* pc = GetPresContext();
+  nsPresContext* rootContext = pc ? pc->GetRootPresContext() : nullptr;
+  MOZ_ASSERT(rootContext);
+  rootContext->NotifyDidPaintForSubtree(aTransactionId);
 }

Mantaroh, could you please elaborate this?  IIUC, RevokeTransactionId is also called when the transaction failed.  It means that bug 1395971 will not be fixed by this bug?
(In reply to Hiroyuki Ikezoe (:hiro) from comment #67)
> From the try in comment 64;
> 
> @@ -2185,16 +2185,22 @@ nsRefreshDriver::RevokeTransactionId(uin
>  {
>    MOZ_ASSERT(aTransactionId == mPendingTransaction);
>    if (mPendingTransaction == mCompletedTransaction + 2 &&
>        mWaitingForTransaction) {
>      MOZ_ASSERT(!mSkippedPaints, "How did we skip a paint when we're in the
> middle of one?");
>      FinishedWaitingForTransaction();
>    }
>    mPendingTransaction--;
> +
> +  // Fire did paint event.
> +  nsPresContext* pc = GetPresContext();
> +  nsPresContext* rootContext = pc ? pc->GetRootPresContext() : nullptr;
> +  MOZ_ASSERT(rootContext);
> +  rootContext->NotifyDidPaintForSubtree(aTransactionId);
>  }
> 
> Mantaroh, could you please elaborate this?  IIUC, RevokeTransactionId is
> also called when the transaction failed.  It means that bug 1395971 will not
> be fixed by this bug?

You mean that test_deferred_start will happen the transaction failed?
The patch of this bug fixed this test, and I think test failure will not happen since bug 1395971 will receive a MozAfterPaint soon after calling the IsDOMPaintEventPending. If it will happen, I think that it means that test is wrong.
Comment on attachment 8944335 [details]
Bug 1419226 - Part 5. Skip the test_fullscreen-api.html and test_transitions_replacement_on_busy_frame.html on webrender.

https://reviewboard.mozilla.org/r/212314/#review220136
Attachment #8944335 - Flags: review?(bugmail) → review+
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #68)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #67)
> > From the try in comment 64;
> > 
> > @@ -2185,16 +2185,22 @@ nsRefreshDriver::RevokeTransactionId(uin
> >  {
> >    MOZ_ASSERT(aTransactionId == mPendingTransaction);
> >    if (mPendingTransaction == mCompletedTransaction + 2 &&
> >        mWaitingForTransaction) {
> >      MOZ_ASSERT(!mSkippedPaints, "How did we skip a paint when we're in the
> > middle of one?");
> >      FinishedWaitingForTransaction();
> >    }
> >    mPendingTransaction--;
> > +
> > +  // Fire did paint event.
> > +  nsPresContext* pc = GetPresContext();
> > +  nsPresContext* rootContext = pc ? pc->GetRootPresContext() : nullptr;
> > +  MOZ_ASSERT(rootContext);
> > +  rootContext->NotifyDidPaintForSubtree(aTransactionId);
> >  }
> > 
> > Mantaroh, could you please elaborate this?  IIUC, RevokeTransactionId is
> > also called when the transaction failed.  It means that bug 1395971 will not
> > be fixed by this bug?
> 
> You mean that test_deferred_start will happen the transaction failed?
> The patch of this bug fixed this test, and I think test failure will not
> happen since bug 1395971 will receive a MozAfterPaint soon after calling the
> IsDOMPaintEventPending. If it will happen, I think that it means that test
> is wrong.

hiro suggested that gecko might send MozAfterPaint when the unexpected, like test_deferred_start.html. If gecko failed to forward the transaction, the gecko will send MozAfterPaint event due to applying the patch of this bug.

I think that this case will happen following cases:
 1) Transition doesn't have a any paint sets. (i.e. EndTransaction() return true)
 2) IPC related error. (i.e. EndTransition() return false)

If I understand correctly, I think this test will not happen these transaction failure cases since test sets a new animation before waiting for a MozAfterPaint(i.e. transaction has paint sets in this tick). 

I tried this test with this patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f8f6402765b1c8342cd1a83e1c18dc3d0f987fdb&group_state=expanded
(Mochitests-e10s-1 of Windows-64-opt is intermittent test which needs to fix.)
I still don't quite understand....

It seems to me there are other failure cases in ShadowLayerForwarder::EndTransaction(), mShadowManager->SendInitReadLocks(locks) failure, or mShadowManager->SendUpdate(info).  What about the cases?
(In reply to Hiroyuki Ikezoe (:hiro) from comment #76)
> I still don't quite understand....
> 
> It seems to me there are other failure cases in
> ShadowLayerForwarder::EndTransaction(),
> mShadowManager->SendInitReadLocks(locks) failure, or
> mShadowManager->SendUpdate(info).  What about the cases?

I think this pattern is IPC error, ShadowLayerForwarder::EndTransaction() return false, so 

(In reply to Hiroyuki Ikezoe (:hiro) from comment #77)
> Hmm, test_deferred_start.html failed on the try for some reasons.
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=f8f6402765b1c8342cd1a83e1c18dc3d0f987fdb&group_state=e
> xpanded&selectedJob=157665537

Ah, there are two bugs which are related to deferred tests.(bug 1363957 and bug 1395971)
I think that this bug will fix the bug 1395971(Desktop), but I'm not sure that this bug will fix the bug 1363957(Android). So I want to fix this bug in the bug 1363957. In any case, I'll look into this android phenomenon on bug 1363957. I think that this test will happen on emulator and debug build.
Sorry I was thinking confused these bugs.
Comment on attachment 8944334 [details]
Bug 1419226 - Part 4. Fire MozAfterPaint when revoking transactions.

https://reviewboard.mozilla.org/r/212312/#review220880

My biggest concern here is that we sent MozAfterPaint for transaction X when revoking that id. That means that the next paint will also get id X, and so we can end up firing MozAfterPaint a second time for the same id number.
I'm not sure if any callers pay enough attention to notice that, but it's a bit confusing at least.

It would be nice if we could change the way revoking a transaction works so that we still increment the id counter for the next paint. I think we'd need nsRefreshDriver to track a third number in that case, splitting mPendingTransaction into mPendingTransaction and mNextTransaction. When we revoke, we decrement mPendingTransaction, but not mNextTransaction.

The other issue is that transaction X-1 might still be in progress (on the compositor). If we fire MozAfterPaint for X when it gets revoked, then that will fire the MozAfterPaint events for all numbers lower (including X-1), which means we're firing it too early. Firing MozAfterPaint for only X would mean we'd fire them out of order, which is worse.

I think we need some extra logic here:

*If there's no invalidation rects on the pres context for X, then no callers are waiting for MozAfterPaint, so we can just skip it entirely and revoke the transaction as before.

* If there's invalidation rects for X, and also invalidation rects for something smaller than X, we want to defer sending MozAfterPaint for X until the lower numbers are complete.

* If there's invalidations rects for X, but no previous ones, then we can deliver MozAfterPaint for X immediately.
Comment on attachment 8944334 [details]
Bug 1419226 - Part 4. Fire MozAfterPaint when revoking transactions.

https://reviewboard.mozilla.org/r/212312/#review220880

Sorry for replying so late.

I addressed two problems which occurring phenomenon when revoking transaction.
First is gecko issues the duplicated transaction id when revoking transaction. This will fix by creating a new tracking number in refresh driver.
Second is we need to wait for firing a MozAfterPaint until a previous compositor return a did paint. For fix it, this patch introduced the mIsWaitingPreviousTransaction member in TransactionInvalidation structure.
Comment on attachment 8944334 [details]
Bug 1419226 - Part 4. Fire MozAfterPaint when revoking transactions.

https://reviewboard.mozilla.org/r/212312/#review223986

Looks great!
Attachment #8944334 - Flags: review?(matt.woodrow) → review+
Thanks mattwoodrow.

Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=aaab4afad69812b8e86329bf2f58257b348d1ee4&group_state=expanded&selectedJob=160742920

Some mochitest are failed on WebRender environment and dom/animation/test/mozilla/test_restyles.html on Windows-opt and macOS-opt.

* mochitest of WebRender :
  - WebRenderLayerManager doesn't override the GetLastTransactionId(). So webrender calls the Notifyinvalidation with transactionId=0 when painting always. Current code will fire MozAfterPaint event via timer of EnsureEventualDidPaintEvent function. Maybe I think that MozAfterPaint mechanism doesn't work correctly on WebRender now.
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #87)
> Maybe I think that MozAfterPaint mechanism doesn't work correctly on WebRender now.

I think you already concluded that in patch "Part 5"? If so can you please make a patch to skip this test too then so we can (finally) land this.
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #87)
>   - WebRenderLayerManager doesn't override the GetLastTransactionId().

This looks like a bug. I filed bug 1437036 for it, and I'll investigate/address it there. Feel free to skip the test here.
(In reply to Mantaroh Yoshinaga[:mantaroh] from comment #87)
> dom/animation/test/mozilla/test_restyles.html on Windows-opt and macOS-opt.

The failure will be fixed by bug 1437248.
Thanks kats, hiro.

I'd like to land the patch of this bug after these bug are fixed.
The patch for bug 1437248 has already landed on autoland.
Blocks: 1382841
mattwoodrow,

After landing the patch 6 of bug 1429932, several tests are failed if I remove the timer of MozAfterPaint.(e.g. test_bug1013412.html[1] / dom/events/test/test_continuous_wheel_events.html[2])

Is changeset of bug 1429932 affect the MozAfterPaint of scrolling?
(i.e. these tests are succeeded due to timer of MozAfterPaint).

It seems like gecko will not receive the MozAfterPaint after receiving the apz-repatins-flushed in EventUtils.js[3]. If I removed the setting the 'nsDisplayList::PAINT_IDENTICAL_DISPLAY_LIST' code[4], these tests are passed.

[1] https://searchfox.org/mozilla-central/source/dom/events/test/test_bug1013412.html
[2] https://searchfox.org/mozilla-central/source/dom/events/test/test_continuous_wheel_events.html
[3] https://searchfox.org/mozilla-central/rev/14d933246211b02f5be21d2e730a57cf087c6606/testing/mochitest/tests/SimpleTest/EventUtils.js#640
[4] https://searchfox.org/mozilla-central/rev/14d933246211b02f5be21d2e730a57cf087c6606/layout/painting/nsDisplayList.cpp#2741
Flags: needinfo?(matt.woodrow)
Do you know why this is?

The idea of part 4 in this bug was that we would still fire MozAfterPaint if we decided not to do an actual paint. The PAINT_IDENTICAL_DISPLAY_LIST option makes this more common, but shouldn't affect the outcome.
Flags: needinfo?(matt.woodrow)
Blocks: 1415065
I'm concerned that this bug is taking too long and Mantaroh is unable to keep up with the pace of change on WebRender in particular. I suggest we land the parts of this bug that can be landed (parts 1 and 3 from what I understand) then spin off a separate bug for the remainder (that hopefully someone more familiar with graphics, and particularly WebRender can tackle).
(In reply to Brian Birtles (:birtles) from comment #95)
> I'm concerned that this bug is taking too long and Mantaroh is unable to
> keep up with the pace of change on WebRender in particular. I suggest we
> land the parts of this bug that can be landed (parts 1 and 3 from what I
> understand) then spin off a separate bug for the remainder (that hopefully
> someone more familiar with graphics, and particularly WebRender can tackle).

That seems reasonable to me. Fixing intermittent bugs is nice, but since this is just fixing bugs in our testing, not bugs in the product, then it's not worth sinking too much time into.

That said, I'm always happy to help tracking down the last few issues.
Attachment #8944335 - Attachment is obsolete: true
Attachment #8930791 - Attachment is obsolete: true
Attachment #8944334 - Attachment is obsolete: true
Thanks, Brian, mattwoordrow,

I rebased this patches. A try result of applying the patch 1 and patch 3 looks working well. I will land this patches.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=90004bd7ea1c2d192374a6d4d24f8c8c0c2969dc&group_state=expanded
Note: The failure of mac environment is the task cluster failure(Infra issue).

Removing the timer of MozAfterPaint is tracked by another bug which I will file.
Blocks: 1445570
Pushed by mantaroh@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/6b81e68d487b
Part 1.Change observing target window of MozAfterPaint. r=mconley
https://hg.mozilla.org/integration/autoland/rev/3606a48f0183
Part 2. Wait for MozAfterPaint instead of waiting for frames on file_deferred_start.html . r=hiro
No longer blocks: 1415065
(In reply to Matt Woodrow (:mattwoodrow) from comment #96)
> That seems reasonable to me. Fixing intermittent bugs is nice, but since
> this is just fixing bugs in our testing, not bugs in the product, then it's
> not worth sinking too much time into.
> 
> That said, I'm always happy to help tracking down the last few issues.

Thanks Matt! I've CC'ed you on the other bug so hopefully someone will be available and motivated to pick that up soon.
https://hg.mozilla.org/mozilla-central/rev/6b81e68d487b
https://hg.mozilla.org/mozilla-central/rev/3606a48f0183
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
See Also: → 1489358
See Also: → 1491883
You need to log in before you can comment on or make changes to this bug.