Intermittent test_frame_reconstruction.html | APZ animation in progress - scrollTop is now 0 | Frame reconstruction triggered - reconstruction triggered 0 times

RESOLVED FIXED in Firefox 48

Status

()

Core
Panning and Zooming
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: aryx, Assigned: kats)

Tracking

({intermittent-failure})

47 Branch
mozilla49
Unspecified
All
intermittent-failure
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox47 wontfix, firefox48 fixed, firefox49 fixed)

Details

(Whiteboard: gfx-noted)

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(2 attachments)

https://treeherder.mozilla.org/logviewer.html#?job_id=21644458&repo=mozilla-inbound

08:23:37     INFO -  3429 INFO TEST-START | gfx/layers/apz/test/mochitest/test_frame_reconstruction.html
08:23:37     INFO -  Flushed APZ repaints, waiting for callback...
08:23:37     INFO -  Flushed APZ repaints, waiting for callback...
08:23:37     INFO -  TEST-INFO | started process screentopng
08:23:40     INFO -  TEST-INFO | screentopng: exit 0
08:23:40     INFO -  3430 INFO must wait for load
08:23:40     INFO -  3431 INFO TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_frame_reconstruction.html | APZ animation in progress - scrollTop is now 0
08:23:40     INFO -      runTest@gfx/layers/apz/test/mochitest/test_frame_reconstruction.html:174:3
08:23:40     INFO -      driveTest@gfx/layers/apz/test/mochitest/test_frame_reconstruction.html:213:13
08:23:40     INFO -      setTimeout handler*SimpleTest_setTimeoutShim@SimpleTest/SimpleTest.js:622:12
08:23:40     INFO -      flushApzRepaints/repaintDone@gfx/layers/apz/test/mochitest/apz_test_utils.js:109:5
08:23:40     INFO -  3432 INFO TEST-PASS | gfx/layers/apz/test/mochitest/test_frame_reconstruction.html | APZ animation not yet completed - scrollTop is now 0
08:23:40     INFO -  Flushed APZ repaints, waiting for callback...
08:23:40     INFO -  3433 INFO TEST-PASS | gfx/layers/apz/test/mochitest/test_frame_reconstruction.html | APZ animation not yet completed - scrollTop is now 54
08:23:40     INFO -  Not taking screenshot here: see the one that was previously logged
08:23:40     INFO -  3434 INFO TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_frame_reconstruction.html | Frame reconstruction triggered - reconstruction triggered 0 times
08:23:40     INFO -      runTest@gfx/layers/apz/test/mochitest/test_frame_reconstruction.html:195:3
08:23:40     INFO -      driveTest@gfx/layers/apz/test/mochitest/test_frame_reconstruction.html:213:13
08:23:40     INFO -      setTimeout handler*SimpleTest_setTimeoutShim@SimpleTest/SimpleTest.js:622:12
08:23:40     INFO -      flushApzRepaints/repaintDone@gfx/layers/apz/test/mochitest/apz_test_utils.js:109:5
08:23:40     INFO -  Flushed APZ repaints, waiting for callback...
08:23:40     INFO -  3435 INFO TEST-PASS | gfx/layers/apz/test/mochitest/test_frame_reconstruction.html | Element should have scrolled by 200px
08:23:40     INFO -  MEMORY STAT | vsize 338MB | residentFast 70MB | heapAllocated 17MB
08:23:40     INFO -  3436 INFO TEST-OK | gfx/layers/apz/test/mochitest/test_frame_reconstruction.html | took 1598ms
Component: Graphics: Layers → Panning and Zooming
Whiteboard: gfx-noted
See Also: → bug 1248084
Assignee: nobody → bugmail.mozilla
Blocks: 1235899
Duplicate of this bug: 1248144
I got a recording of this with rr chaos mode locally, I'll investigate it at some point today or tomorrow.
This appears to be caused by some sort of race between the content repaint request and the scroll acknowledgement. In the parent process I see the following sequence of events:

1) APZC::AcknowledgeScrollUpdate(), with mFrameMetrics.mScrollGeneration = 19. This is sent to the child process, but it is rejected because by then the child is at scroll generation 20.

2) APZC::RequestContentRepaint is called while mFrameMetrics.mScrollGeneration = 19 and mScrollOffset = 0,0. This is called on the compositor thread and so it redispatches itself to the main thread.

3) APZC::NotifyLayersUpdate is called with a smooth-scroll update, so mFrameMetrics.mScrollGeneration is bumped to 20 and AcknowledgeScrollUpdate() is called. This happens on the compositor thread, so in RenderFrameParent it redispatches itself to mUILoop.

4) A composite happens, moving mFrameMetrics.mScrollOffset to (0,18).

5) The redispatch from step (2) executes on the main thread. It picks up mFrameMetrics.mScrollGeneration = 20, mScrollOffset = 0,18, and sends that to the child process. On the child process this is thrown away because the last scroll origin is not clobberable by APZ (it was set by the smooth scroll that triggered step 3, and hasn't been acknowledged yet).

6) The redispatch from step (3) executes on the main thread. It is sent to the child process and successfully acknowledges scroll generation 20.

At this point we do the flush and read the scroll offset on the main thread, which is 0,0 but should have been 0,18. Fundamentally I think one problem is that a repaint request can go out on the main thread with a scroll generation that hasn't actually been acknowledged yet. This happens because mFrameMetrics.mScrollGeneration is updated on the compositor thread before and the acknowledgement only goes out later on the main thread. So making that more consistent would help here. Alternatively we could just combine the scroll acknowledgement update into the content repaint request, so that in step 5 the repaint request would have gotten applied because it implicitly carried the acknowledgement for scroll generation 20. I don't know if there's any good reasons to keep them as separate messages.
Created attachment 8721600 [details] [diff] [review]
Patch to repro

Slightly cheating but this patch seems to repro it for me.
I got rid of the scroll-acknowledgement message and it seemed to fix the problem. Try push:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e6f79a31dc6b

I'll need to verify bug 1068802 didn't regress, and ideally write a test for that.

Comment 6

2 years ago
17 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 10
* fx-team: 5
* try: 2

Platform breakdown:
* linux32: 8
* linux64: 7
* android-4-3-armv7-api15: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1248078&startday=2016-02-22&endday=2016-02-28&tree=all
Summary: Intermittent /test_frame_reconstruction.html | APZ animation in progress - scrollTop is now 0 | Frame reconstruction triggered - reconstruction triggered 0 times → Intermittent test_frame_reconstruction.html | APZ animation in progress - scrollTop is now 0 | Frame reconstruction triggered - reconstruction triggered 0 times

Comment 7

2 years ago
25 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 13
* try: 5
* fx-team: 5
* mozilla-central: 2

Platform breakdown:
* linux64: 20
* osx-10-6: 3
* linux32: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1248078&startday=2016-02-29&endday=2016-03-06&tree=all
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #5)
> I'll need to verify bug 1068802 didn't regress, and ideally write a test for
> that.

I'm not sure if that bug ever affected desktop. At any rate, I can't repro it anymore. That also makes it hard to write a test for it.
Created attachment 8727973 [details]
MozReview Request: Bug 1248078 - Remove the dedicated scroll acknowledgement message. r?botond

Review commit: https://reviewboard.mozilla.org/r/38715/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/38715/
Attachment #8727973 - Flags: review?(botond)
Comment on attachment 8727973 [details]
MozReview Request: Bug 1248078 - Remove the dedicated scroll acknowledgement message. r?botond

Hm, I see some test failures on the try push. Will need to investigate...
Attachment #8727973 - Flags: review?(botond)

Comment 11

2 years ago
37 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 17
* fx-team: 8
* try: 5
* mozilla-central: 4
* mozilla-aurora: 3

Platform breakdown:
* linux64: 31
* linux32: 3
* windows7-32: 2
* android-4-3-armv7-api15: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1248078&startday=2016-03-07&endday=2016-03-13&tree=all

Comment 12

2 years ago
19 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 11
* fx-team: 4
* mozilla-aurora: 3
* try: 1

Platform breakdown:
* linux64: 16
* linux32: 2
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1248078&startday=2016-03-14&endday=2016-03-20&tree=all

Comment 13

2 years ago
39 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 23
* fx-team: 8
* try: 3
* mozilla-aurora: 3
* mozilla-central: 2

Platform breakdown:
* linux64: 36
* linux32: 3

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1248078&startday=2016-03-21&endday=2016-03-27&tree=all

Comment 14

2 years ago
17 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 8
* mozilla-aurora: 3
* fx-team: 3
* try: 2
* mozilla-central: 1

Platform breakdown:
* linux64: 17

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1248078&startday=2016-03-28&endday=2016-04-03&tree=all

Comment 15

2 years ago
25 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 11
* fx-team: 6
* try: 4
* mozilla-central: 3
* ash: 1

Platform breakdown:
* linux64: 23
* linux32: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1248078&startday=2016-04-04&endday=2016-04-10&tree=all
Blocks: 1248084
See Also: bug 1248084
The test failures mentioned in comment 10 should be fixed by the patch on bug 1262432. I'll rebase this and see what happens.
status-firefox48: --- → affected
OS: Unspecified → All
Version: Trunk → 47 Branch
The try push seems ok: https://treeherder.mozilla.org/#/jobs?repo=try&revision=374e91e15379&group_state=expanded - there's some M-4 failures in non-e10s but they look unrelated (specially since non-e10s means this code isn't getting exercised at all). I'll put up the rebased patch for review again.
Comment on attachment 8727973 [details]
MozReview Request: Bug 1248078 - Remove the dedicated scroll acknowledgement message. r?botond

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/38715/diff/1-2/
Attachment #8727973 - Flags: review?(botond)

Updated

2 years ago
Attachment #8727973 - Flags: review?(botond)
Comment on attachment 8727973 [details]
MozReview Request: Bug 1248078 - Remove the dedicated scroll acknowledgement message. r?botond

https://reviewboard.mozilla.org/r/38715/#review43415

::: gfx/layers/apz/src/AsyncPanZoomController.cpp
(Diff revision 2)
>    APZC_LOG("%p updating scroll offset from %s to %s\n", this,
>        Stringify(mFrameMetrics.GetScrollOffset()).c_str(),
>        Stringify(aScrollOffset).c_str());
>  
>    mFrameMetrics.UpdateScrollInfo(aScrollGeneration, aScrollOffset);
> -  AcknowledgeScrollUpdate();

This will need to be rebased against bug 1257641, which removes NotifyScrollUpdated().

::: gfx/layers/apz/util/APZCCallbackHelper.cpp:245
(Diff revision 2)
>      // If the pres shell resolution has changed on the content side side
>      // the time this repaint request was fired, consider this request out of date
>      // and drop it; setting a zoom based on the out-of-date resolution can have
>      // the effect of getting us stuck with the stale resolution.
>      if (presShellResolution != aMetrics.GetPresShellResolution()) {
>        return;

If we hit this early return, ResetScrollInfoIfGeneration() will not be called. Might that be a problem?
https://reviewboard.mozilla.org/r/38715/#review43415

> If we hit this early return, ResetScrollInfoIfGeneration() will not be called. Might that be a problem?

Hm, good question, I hadn't considered that. However, thinking about it, I think it shouldn't be a problem because the next repaint request that does successfully go through this code will do the reset. If *that* repaint request has a stale generation, then it should basically have the same effect as the code without this patch. (It means that the main thread advanced to a new generation between the two repaint requests, and in either case, the repaint request will not clobber the main thread scroll position).

Comment 21

2 years ago
32 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 11
* try: 8
* fx-team: 5
* mozilla-aurora: 4
* mozilla-central: 2
* ash: 2

Platform breakdown:
* linux64: 22
* linux32: 5
* windowsxp: 2
* windows7-32: 2
* osx-10-10: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1248078&startday=2016-04-11&endday=2016-04-17&tree=all
Comment on attachment 8727973 [details]
MozReview Request: Bug 1248078 - Remove the dedicated scroll acknowledgement message. r?botond

Restoring r? since I think I addressed your concern in comment 20. Let me know if that's not the case.
Attachment #8727973 - Flags: review?(botond)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #20)
> > If we hit this early return, ResetScrollInfoIfGeneration() will not be called. Might that be a problem?
> 
> Hm, good question, I hadn't considered that. However, thinking about it, I
> think it shouldn't be a problem because the next repaint request that does
> successfully go through this code will do the reset. If *that* repaint
> request has a stale generation, then it should basically have the same
> effect as the code without this patch. (It means that the main thread
> advanced to a new generation between the two repaint requests, and in either
> case, the repaint request will not clobber the main thread scroll position).

That makes sense. Thanks!

Updated

2 years ago
Attachment #8727973 - Flags: review?(botond) → review+
Comment on attachment 8727973 [details]
MozReview Request: Bug 1248078 - Remove the dedicated scroll acknowledgement message. r?botond

https://reviewboard.mozilla.org/r/38715/#review43677

Comment 26

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/1fd853200c99
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox48: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
No longer blocks: 1248084
Duplicate of this bug: 1248084
I don't see any more of these in orange factor after this landed.
Depends on: 1266755
I backed this out for talos regressions that were unexpected, see bug 1266755 comment 4 for details.

https://hg.mozilla.org/integration/mozilla-inbound/rev/a45b816fdd30
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Comment 30

2 years ago
7 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 4
* mozilla-aurora: 2
* mozilla-central: 1

Platform breakdown:
* linux64: 6
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1248078&startday=2016-04-18&endday=2016-04-24&tree=all

Comment 32

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/8d62ba5ddb0b
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
status-firefox49: --- → fixed
Resolution: --- → FIXED
Target Milestone: mozilla48 → mozilla49
status-firefox48: fixed → affected
Probably too risky to uplift to 47, but we can let it bake some more and then decide about 48.
status-firefox47: affected → wontfix

Comment 34

2 years ago
10 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 5
* fx-team: 3
* try: 2

Platform breakdown:
* linux64: 9
* android-4-3-armv7-api15: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1248078&startday=2016-04-25&endday=2016-05-01&tree=all
Comment on attachment 8727973 [details]
MozReview Request: Bug 1248078 - Remove the dedicated scroll acknowledgement message. r?botond

Approval Request Comment
[Feature/regressing bug #]: part of APZ
[User impact if declined]: possible to hit a rare race condition that leaves the user at an incorrect scroll offset, if the page is changing while they're scrolling. in practice it's unlikely that most people would even notice though.
[Describe test coverage new/current, TreeHerder]: there is a test that was failing intermittently, this patch fixes it
[Risks and why]: fairly low risk. the patch is mostly code deletion and it's been baking for a few weeks with no known regressions.
[String/UUID change made/needed]: none
Attachment #8727973 - Flags: approval-mozilla-aurora?
Comment on attachment 8727973 [details]
MozReview Request: Bug 1248078 - Remove the dedicated scroll acknowledgement message. r?botond

APZ issue, should fix a test failure in aurora
Attachment #8727973 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I'm hitting a conflict uplifting this to aurora.

hg graft -er 8d62ba5ddb0b
grafting 340127:8d62ba5ddb0b "Bug 1248078 - Remove the dedicated scroll acknowledgement message. r=botond"
merging gfx/layers/apz/util/APZCCallbackHelper.cpp
warning: conflicts while merging gfx/layers/apz/util/APZCCallbackHelper.cpp! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use hg resolve and hg graft --continue)
Flags: needinfo?(bugmail.mozilla)

Comment 38

2 years ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/fce8621653ed
status-firefox48: affected → fixed
Flags: needinfo?(bugmail.mozilla)

Comment 39

2 years ago
5 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-beta: 3
* try: 1
* mozilla-aurora: 1

Platform breakdown:
* linux64: 5

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1248078&startday=2016-05-16&endday=2016-05-22&tree=all
You need to log in before you can comment on or make changes to this bug.