Closed Bug 1237905 Opened 4 years ago Closed 4 years ago

Intermittent e10s test_wheel_transactions.html | Test timed out

Categories

(Core :: Panning and Zooming, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
e10s + ---
firefox45 --- wontfix
firefox46 --- fixed
firefox47 --- fixed
firefox-esr45 --- unaffected

People

(Reporter: philor, Assigned: botond)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure, Whiteboard: [gfx-noted])

Attachments

(2 files, 1 obsolete file)

Old intermittents never die. They just wait for the right opportunity to crop up again.
Assignee: nobody → botond
Summary: Intermittent test_wheel_transactions.html | Test timed out → Intermittent e10s test_wheel_transactions.html | Test timed out
Blocks: e10s-tests
tracking-e10s: --- → +
Blanket gfx-noted on APZ intermittent failures so they can be found more easily on https://brasstacks.mozilla.com/orangefactor/
Whiteboard: [gfx-noted]
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #3)
> Blanket gfx-noted on APZ intermittent failures so they can be found more
> easily on https://brasstacks.mozilla.com/orangefactor/

Unfortunately, I couldn't figure out what's going wrong from this logging. I did another push which additionally has test-specific logging:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1d00db71cc12
(In reply to Botond Ballo [:botond] from comment #5)
> I did another push which additionally has test-specific logging:
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=1d00db71cc12

This log is showing that the very first native scroll wheel event synthesized by the test never results in a subsequent 'scroll' event. Here's another push with more logging to try to see where in the pipeline things go wrong:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=07e66d77ec15
(In reply to Botond Ballo [:botond] from comment #6)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=07e66d77ec15

We're getting as far as the ScrollToCSSPixelsApproximate() call in APZCCallbackHelper. Tracing further:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=804c3446cfad
(In reply to Botond Ballo [:botond] from comment #7)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=804c3446cfad

150 retriggers and no repros :(
Going to try a Windows push instead, maybe the issue occurs more frequently there (the automation job failures reported in comment 2 are all on Windows):

https://treeherder.mozilla.org/#/jobs?repo=try&revision=045d6482009e
(In reply to Botond Ballo [:botond] from comment #9)
> Going to try a Windows push instead, maybe the issue occurs more frequently
> there (the automation job failures reported in comment 2 are all on Windows):
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=045d6482009e

No luck so far; maybe I'll have better luck today (also rebased to latest m-c):

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e4e01f342ad3
I haven't had luck repro'ing on Win32, but I was able to get a repro on Linux [1], which told me that the reason we're not getting a 'scroll' event is that at the time we're expecting it to be triggered, the current scroll position is already the target scroll position. I'm continuing to investigate with more logging.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=dbcd40e03ada
(In reply to Botond Ballo [:botond] from comment #7)
> We're getting as far as the ScrollToCSSPixelsApproximate() call in
> APZCCallbackHelper.

(In reply to Botond Ballo [:botond] from comment #12)
> the reason we're not getting a 'scroll' event
> is that at the time we're expecting it to be triggered, the current scroll
> position is already the target scroll position.

I was misled; the ScrollToCSSPixelsApproximate() call we were getting in APZCCallbackHelper was for a different scroll frame, which was indeed already at its target scroll position.

The update that we are interested is getting dropped because the scroll frame it pertains to is going away for some reason; I'm trying to investigate why, but am again having problems reproducing the failure [1].

Due to the very intermittent nature of this failure, I think I'll try reproducing it locally and getting an rr recording for it instead, and reproducing it in automation is very hit-and-miss.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=1e69f1dd57b8
(In reply to Botond Ballo [:botond] from comment #14)
> Due to the very intermittent nature of this failure, I think I'll try
> reproducing it locally and getting an rr recording for it instead, and
> reproducing it in automation is very hit-and-miss.

I haven't had any luck with a local repro, either.

As a next step, I will try landing some of my logging (enabled for this test only), and examining the logs produced by the failures on m-i etc. that are still happening. (Thanks Kats for the suggestion!)
(In reply to Botond Ballo [:botond] from comment #15)
> As a next step, I will try landing some of my logging (enabled for this test
> only), and examining the logs produced by the failures on m-i etc. that are
> still happening. (Thanks Kats for the suggestion!)

Here's a Try push with the logging patch that I intend to land (only), to verify that it's not causing other tests to be spammed with logging:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ce7e86cfe0db
(In reply to Botond Ballo [:botond] from comment #16)
> (In reply to Botond Ballo [:botond] from comment #15)
> > As a next step, I will try landing some of my logging (enabled for this test
> > only), and examining the logs produced by the failures on m-i etc. that are
> > still happening. (Thanks Kats for the suggestion!)
> 
> Here's a Try push with the logging patch that I intend to land (only), to
> verify that it's not causing other tests to be spammed with logging:
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=ce7e86cfe0db

Looking good (also verified with a Try push that doesn't use '--tag apz' [1]). Posting for review.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=5e16fa7c4e62
(MozReview is not in an accepting mood right now.)
Attachment #8713912 - Flags: review?(bugmail.mozilla)
Attachment #8713912 - Flags: review?(bugmail.mozilla) → review+
Marking leave-open as this is just a logging patch, not a fix.
Keywords: leave-open
Well I feel silly for not having thought of this sooner, but from the logs we've collected from automation, it appears that the problem is the same as in bug 1203901 and bug 1194546: we are synthesizing native events before the first paint of the new page content for the test reaches the compositor, so APZ is interpreting the events as applying to the old content. It tells Gecko to scroll the old content, which Gecko ignores as that content is no longer in the DOM, and the test hangs waiting for the new content to receive a 'scroll' event.
Attachment #8718164 - Flags: review?(bugmail.mozilla) → review+
Comment on attachment 8718164 [details]
MozReview Request: Bug 1237905 - Ensure the page is painted before synthesizing native events. r=kats

https://reviewboard.mozilla.org/r/34439/#review31161
Leaving 'leave-open' keyword until BrassTacks confirms that intermittent is gone. I will also land a follow-up to remove the logging patch that landed in comment 19.
(In reply to Botond Ballo [:botond] from comment #29)
> Leaving 'leave-open' keyword until BrassTacks confirms that intermittent is
> gone. 

Looking at [1], there have been no occurrences of this intermittent on trees where the fix has landed. I think we can call this fixed!

[1] https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1237905&startday=2016-02-08&tree=all

> I will also land a follow-up to remove the logging patch that landed
> in comment 19.

Coming up.
Keywords: leave-open
Attachment #8718164 - Attachment is obsolete: true
Comment on attachment 8721803 [details]
MozReview Request: Bug 1237905 - Remove logging added temporarily to test_wheel_transactions to diagnose an intermittent failure. r=kats

https://reviewboard.mozilla.org/r/35791/#review32413

Thanks!
Attachment #8721803 - Flags: review?(bugmail.mozilla) → review+
https://hg.mozilla.org/mozilla-central/rev/0bf10308ab31
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Whiteboard: [gfx-noted] → [gfx-noted][checkin-needed-aurora]
https://hg.mozilla.org/releases/mozilla-aurora/rev/fcb61247e426
Whiteboard: [gfx-noted][checkin-needed-aurora] → [gfx-noted]
You need to log in before you can comment on or make changes to this bug.