Closed Bug 1248078 Opened 8 years ago Closed 8 years ago

Intermittent test_frame_reconstruction.html | APZ animation in progress - scrollTop is now 0 | Frame reconstruction triggered - reconstruction triggered 0 times

Categories

(Core :: Panning and Zooming, defect)

47 Branch
Unspecified
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox47 --- wontfix
firefox48 --- fixed
firefox49 --- fixed

People

(Reporter: aryx, Assigned: kats)

References

Details

(Keywords: intermittent-failure, Whiteboard: gfx-noted)

Attachments

(2 files)

https://treeherder.mozilla.org/logviewer.html#?job_id=21644458&repo=mozilla-inbound

08:23:37     INFO -  3429 INFO TEST-START | gfx/layers/apz/test/mochitest/test_frame_reconstruction.html
08:23:37     INFO -  Flushed APZ repaints, waiting for callback...
08:23:37     INFO -  Flushed APZ repaints, waiting for callback...
08:23:37     INFO -  TEST-INFO | started process screentopng
08:23:40     INFO -  TEST-INFO | screentopng: exit 0
08:23:40     INFO -  3430 INFO must wait for load
08:23:40     INFO -  3431 INFO TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_frame_reconstruction.html | APZ animation in progress - scrollTop is now 0
08:23:40     INFO -      runTest@gfx/layers/apz/test/mochitest/test_frame_reconstruction.html:174:3
08:23:40     INFO -      driveTest@gfx/layers/apz/test/mochitest/test_frame_reconstruction.html:213:13
08:23:40     INFO -      setTimeout handler*SimpleTest_setTimeoutShim@SimpleTest/SimpleTest.js:622:12
08:23:40     INFO -      flushApzRepaints/repaintDone@gfx/layers/apz/test/mochitest/apz_test_utils.js:109:5
08:23:40     INFO -  3432 INFO TEST-PASS | gfx/layers/apz/test/mochitest/test_frame_reconstruction.html | APZ animation not yet completed - scrollTop is now 0
08:23:40     INFO -  Flushed APZ repaints, waiting for callback...
08:23:40     INFO -  3433 INFO TEST-PASS | gfx/layers/apz/test/mochitest/test_frame_reconstruction.html | APZ animation not yet completed - scrollTop is now 54
08:23:40     INFO -  Not taking screenshot here: see the one that was previously logged
08:23:40     INFO -  3434 INFO TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_frame_reconstruction.html | Frame reconstruction triggered - reconstruction triggered 0 times
08:23:40     INFO -      runTest@gfx/layers/apz/test/mochitest/test_frame_reconstruction.html:195:3
08:23:40     INFO -      driveTest@gfx/layers/apz/test/mochitest/test_frame_reconstruction.html:213:13
08:23:40     INFO -      setTimeout handler*SimpleTest_setTimeoutShim@SimpleTest/SimpleTest.js:622:12
08:23:40     INFO -      flushApzRepaints/repaintDone@gfx/layers/apz/test/mochitest/apz_test_utils.js:109:5
08:23:40     INFO -  Flushed APZ repaints, waiting for callback...
08:23:40     INFO -  3435 INFO TEST-PASS | gfx/layers/apz/test/mochitest/test_frame_reconstruction.html | Element should have scrolled by 200px
08:23:40     INFO -  MEMORY STAT | vsize 338MB | residentFast 70MB | heapAllocated 17MB
08:23:40     INFO -  3436 INFO TEST-OK | gfx/layers/apz/test/mochitest/test_frame_reconstruction.html | took 1598ms
Component: Graphics: Layers → Panning and Zooming
Whiteboard: gfx-noted
See Also: → 1248084
Assignee: nobody → bugmail.mozilla
Blocks: 1235899
I got a recording of this with rr chaos mode locally, I'll investigate it at some point today or tomorrow.
This appears to be caused by some sort of race between the content repaint request and the scroll acknowledgement. In the parent process I see the following sequence of events:

1) APZC::AcknowledgeScrollUpdate(), with mFrameMetrics.mScrollGeneration = 19. This is sent to the child process, but it is rejected because by then the child is at scroll generation 20.

2) APZC::RequestContentRepaint is called while mFrameMetrics.mScrollGeneration = 19 and mScrollOffset = 0,0. This is called on the compositor thread and so it redispatches itself to the main thread.

3) APZC::NotifyLayersUpdate is called with a smooth-scroll update, so mFrameMetrics.mScrollGeneration is bumped to 20 and AcknowledgeScrollUpdate() is called. This happens on the compositor thread, so in RenderFrameParent it redispatches itself to mUILoop.

4) A composite happens, moving mFrameMetrics.mScrollOffset to (0,18).

5) The redispatch from step (2) executes on the main thread. It picks up mFrameMetrics.mScrollGeneration = 20, mScrollOffset = 0,18, and sends that to the child process. On the child process this is thrown away because the last scroll origin is not clobberable by APZ (it was set by the smooth scroll that triggered step 3, and hasn't been acknowledged yet).

6) The redispatch from step (3) executes on the main thread. It is sent to the child process and successfully acknowledges scroll generation 20.

At this point we do the flush and read the scroll offset on the main thread, which is 0,0 but should have been 0,18. Fundamentally I think one problem is that a repaint request can go out on the main thread with a scroll generation that hasn't actually been acknowledged yet. This happens because mFrameMetrics.mScrollGeneration is updated on the compositor thread before and the acknowledgement only goes out later on the main thread. So making that more consistent would help here. Alternatively we could just combine the scroll acknowledgement update into the content repaint request, so that in step 5 the repaint request would have gotten applied because it implicitly carried the acknowledgement for scroll generation 20. I don't know if there's any good reasons to keep them as separate messages.
Attached patch Patch to reproSplinter Review
Slightly cheating but this patch seems to repro it for me.
I got rid of the scroll-acknowledgement message and it seemed to fix the problem. Try push:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e6f79a31dc6b

I'll need to verify bug 1068802 didn't regress, and ideally write a test for that.
Summary: Intermittent /test_frame_reconstruction.html | APZ animation in progress - scrollTop is now 0 | Frame reconstruction triggered - reconstruction triggered 0 times → Intermittent test_frame_reconstruction.html | APZ animation in progress - scrollTop is now 0 | Frame reconstruction triggered - reconstruction triggered 0 times
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #5)
> I'll need to verify bug 1068802 didn't regress, and ideally write a test for
> that.

I'm not sure if that bug ever affected desktop. At any rate, I can't repro it anymore. That also makes it hard to write a test for it.
Comment on attachment 8727973 [details]
MozReview Request: Bug 1248078 - Remove the dedicated scroll acknowledgement message. r?botond

Hm, I see some test failures on the try push. Will need to investigate...
Attachment #8727973 - Flags: review?(botond)
Blocks: 1248084
See Also: 1248084
The test failures mentioned in comment 10 should be fixed by the patch on bug 1262432. I'll rebase this and see what happens.
OS: Unspecified → All
Version: Trunk → 47 Branch
The try push seems ok: https://treeherder.mozilla.org/#/jobs?repo=try&revision=374e91e15379&group_state=expanded - there's some M-4 failures in non-e10s but they look unrelated (specially since non-e10s means this code isn't getting exercised at all). I'll put up the rebased patch for review again.
Comment on attachment 8727973 [details]
MozReview Request: Bug 1248078 - Remove the dedicated scroll acknowledgement message. r?botond

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/38715/diff/1-2/
Attachment #8727973 - Flags: review?(botond)
Attachment #8727973 - Flags: review?(botond)
Comment on attachment 8727973 [details]
MozReview Request: Bug 1248078 - Remove the dedicated scroll acknowledgement message. r?botond

https://reviewboard.mozilla.org/r/38715/#review43415

::: gfx/layers/apz/src/AsyncPanZoomController.cpp
(Diff revision 2)
>    APZC_LOG("%p updating scroll offset from %s to %s\n", this,
>        Stringify(mFrameMetrics.GetScrollOffset()).c_str(),
>        Stringify(aScrollOffset).c_str());
>  
>    mFrameMetrics.UpdateScrollInfo(aScrollGeneration, aScrollOffset);
> -  AcknowledgeScrollUpdate();

This will need to be rebased against bug 1257641, which removes NotifyScrollUpdated().

::: gfx/layers/apz/util/APZCCallbackHelper.cpp:245
(Diff revision 2)
>      // If the pres shell resolution has changed on the content side side
>      // the time this repaint request was fired, consider this request out of date
>      // and drop it; setting a zoom based on the out-of-date resolution can have
>      // the effect of getting us stuck with the stale resolution.
>      if (presShellResolution != aMetrics.GetPresShellResolution()) {
>        return;

If we hit this early return, ResetScrollInfoIfGeneration() will not be called. Might that be a problem?
https://reviewboard.mozilla.org/r/38715/#review43415

> If we hit this early return, ResetScrollInfoIfGeneration() will not be called. Might that be a problem?

Hm, good question, I hadn't considered that. However, thinking about it, I think it shouldn't be a problem because the next repaint request that does successfully go through this code will do the reset. If *that* repaint request has a stale generation, then it should basically have the same effect as the code without this patch. (It means that the main thread advanced to a new generation between the two repaint requests, and in either case, the repaint request will not clobber the main thread scroll position).
Comment on attachment 8727973 [details]
MozReview Request: Bug 1248078 - Remove the dedicated scroll acknowledgement message. r?botond

Restoring r? since I think I addressed your concern in comment 20. Let me know if that's not the case.
Attachment #8727973 - Flags: review?(botond)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #20)
> > If we hit this early return, ResetScrollInfoIfGeneration() will not be called. Might that be a problem?
> 
> Hm, good question, I hadn't considered that. However, thinking about it, I
> think it shouldn't be a problem because the next repaint request that does
> successfully go through this code will do the reset. If *that* repaint
> request has a stale generation, then it should basically have the same
> effect as the code without this patch. (It means that the main thread
> advanced to a new generation between the two repaint requests, and in either
> case, the repaint request will not clobber the main thread scroll position).

That makes sense. Thanks!
Attachment #8727973 - Flags: review?(botond) → review+
Comment on attachment 8727973 [details]
MozReview Request: Bug 1248078 - Remove the dedicated scroll acknowledgement message. r?botond

https://reviewboard.mozilla.org/r/38715/#review43677
https://hg.mozilla.org/mozilla-central/rev/1fd853200c99
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
No longer blocks: 1248084
I don't see any more of these in orange factor after this landed.
I backed this out for talos regressions that were unexpected, see bug 1266755 comment 4 for details.

https://hg.mozilla.org/integration/mozilla-inbound/rev/a45b816fdd30
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
https://hg.mozilla.org/mozilla-central/rev/8d62ba5ddb0b
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Target Milestone: mozilla48 → mozilla49
Probably too risky to uplift to 47, but we can let it bake some more and then decide about 48.
Comment on attachment 8727973 [details]
MozReview Request: Bug 1248078 - Remove the dedicated scroll acknowledgement message. r?botond

Approval Request Comment
[Feature/regressing bug #]: part of APZ
[User impact if declined]: possible to hit a rare race condition that leaves the user at an incorrect scroll offset, if the page is changing while they're scrolling. in practice it's unlikely that most people would even notice though.
[Describe test coverage new/current, TreeHerder]: there is a test that was failing intermittently, this patch fixes it
[Risks and why]: fairly low risk. the patch is mostly code deletion and it's been baking for a few weeks with no known regressions.
[String/UUID change made/needed]: none
Attachment #8727973 - Flags: approval-mozilla-aurora?
Comment on attachment 8727973 [details]
MozReview Request: Bug 1248078 - Remove the dedicated scroll acknowledgement message. r?botond

APZ issue, should fix a test failure in aurora
Attachment #8727973 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I'm hitting a conflict uplifting this to aurora.

hg graft -er 8d62ba5ddb0b
grafting 340127:8d62ba5ddb0b "Bug 1248078 - Remove the dedicated scroll acknowledgement message. r=botond"
merging gfx/layers/apz/util/APZCCallbackHelper.cpp
warning: conflicts while merging gfx/layers/apz/util/APZCCallbackHelper.cpp! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use hg resolve and hg graft --continue)
Flags: needinfo?(bugmail.mozilla)
Flags: needinfo?(bugmail.mozilla)
You need to log in before you can comment on or make changes to this bug.