Closed Bug 1285176 Opened 3 years ago Closed 3 years ago

Intermittent toolkit/content/tests/mochitest/test_mousecapture.xhtml | selection scroll position after timer is not more than 320

Categories

(Toolkit :: General, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox50 --- unaffected
firefox51 --- wontfix
firefox52 --- fixed
firefox53 --- fixed

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell fixed])

Attachments

(2 files)

Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
this started spiking January 4th on mochitest-e10s for linux* debug.

on inbound, this seems to start with a merge from autoland:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=linux%20debug%20mochitest-e10s-10&tochange=6e1c9517001b69c3c514267690f3d9727a2f45b6&fromchange=9e96dcbf938f37d1537b66c480ccd8ff32fd9262

those map to this autoland revisision range:
https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=mochitest-e10s-10%20linux%20debug&tochange=56915773d15a&fromchange=17caae57fcd0

I did some retriggers on the above range.

I see that :jaws did some eslint work which landed roughly around the time of initial failure:
https://hg.mozilla.org/mozilla-central/diff/06698dfb3784/toolkit/content/tests/mochitest/test_mousecapture.xhtml

although to be honest I cannot see how this change in { } style would affect the test to be intermittent.

:gijs, I see you are the author of the test, can you help figure out why this is problematic?
Flags: needinfo?(gijskruitbosch+bugs)
I don't see this showing up easily on that specific range in autoland :(
I mostly just converted it from the chrome one. I know kats looked at this to fix earlier intermittents to do with the scroll position - more details in bug 1258896. Maybe the limit needs upping? It's hard to know without seeing the value that we're comparing against... I wonder why we don't just info() that. :-\

:kats, do you see something relevant in the autoland range, or do you know of something related that landed in inbound around this time?
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(bugmail)
seems to be more on inbound and not on the autoland range after doing a larger volume of retriggers- I will continue to retrigger and see if I can get to a more specific range.
many more retriggers later:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=linux%20debug%20mochitest-e10s-10&tochange=6e1c9517001b69c3c514267690f3d9727a2f45b6&fromchange=9ead32283c57376a221cc0b16d3a23760ef54b63

and I found the root cause:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9e96dcbf938f37d1537b66c480ccd8ff32fd9262

this is from bug 1323319.  :mconley, I see you made the code changes here, can you take a look at this and see why this is happening- maybe we can adjust the patch, or maybe we have to adjust the test
Flags: needinfo?(mconley)
(In reply to :Gijs from comment #12)
> I mostly just converted it from the chrome one. I know kats looked at this
> to fix earlier intermittents to do with the scroll position - more details
> in bug 1258896. Maybe the limit needs upping? It's hard to know without
> seeing the value that we're comparing against... I wonder why we don't just
> info() that. :-\

Yeah, we should stick that value in the assertion message. I agree that we might just need the limit upped a bit.

> :kats, do you see something relevant in the autoland range, or do you know
> of something related that landed in inbound around this time?

Looks like Joel tracked down the regressing changeset, although I have no idea why that change would cause this failure.
Flags: needinfo?(bugmail)
The patch in bug 1323319 causes us to force paint the first appearance of a new window or tab when in e10s mode, and is part of the effort to kill tab switch spinners.

I notice that this test opens both a new tab and a new window... perhaps it's not waiting for the page to be interactive before running some step?

I'm seeing the following non-fatal assertions in the log before the test fails:

[task 2017-01-09T21:27:47.655098Z] 21:27:47     INFO - [Child 2617] ###!!! ASSERTION: Unexpected document: 'capturingContent->GetComposedDoc() == GetDocument()', file /home/worker/workspace/build/src/layout/base/PresShell.cpp, line 7287
[task 2017-01-09T21:28:37.028660Z] 21:28:37     INFO - #01: nsViewManager::DispatchEvent [view/nsViewManager.cpp:815]
[task 2017-01-09T21:28:37.028923Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.033129Z] 21:28:37     INFO - #02: nsView::HandleEvent [view/nsView.cpp:1125]
[task 2017-01-09T21:28:37.033332Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.035637Z] 21:28:37     INFO - #03: mozilla::widget::PuppetWidget::DispatchEvent [widget/PuppetWidget.cpp:360]
[task 2017-01-09T21:28:37.037059Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.039689Z] 21:28:37     INFO - #04: nsContentUtils::SendMouseEvent [dom/base/nsContentUtils.cpp:8324]
[task 2017-01-09T21:28:37.041830Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.044966Z] 21:28:37     INFO - #05: nsDOMWindowUtils::SendMouseEventCommon [dom/base/nsDOMWindowUtils.cpp:718]
[task 2017-01-09T21:28:37.047316Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.051539Z] 21:28:37     INFO - #06: NS_InvokeByIndex
[task 2017-01-09T21:28:37.053674Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.056257Z] 21:28:37     INFO - [Child 2617] ###!!! ASSERTION: Unexpected document: 'capturingContent->GetComposedDoc() == GetDocument()', file /home/worker/workspace/build/src/layout/base/PresShell.cpp, line 7432
[task 2017-01-09T21:28:37.058740Z] 21:28:37     INFO - #01: nsViewManager::DispatchEvent [view/nsViewManager.cpp:815]
[task 2017-01-09T21:28:37.063693Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.066006Z] 21:28:37     INFO - #02: nsView::HandleEvent [view/nsView.cpp:1125]
[task 2017-01-09T21:28:37.067915Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.069706Z] 21:28:37     INFO - #03: mozilla::widget::PuppetWidget::DispatchEvent [widget/PuppetWidget.cpp:360]
[task 2017-01-09T21:28:37.071564Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.075278Z] 21:28:37     INFO - #04: nsContentUtils::SendMouseEvent [dom/base/nsContentUtils.cpp:8324]
[task 2017-01-09T21:28:37.077392Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.080003Z] 21:28:37     INFO - #05: nsDOMWindowUtils::SendMouseEventCommon [dom/base/nsDOMWindowUtils.cpp:718]
[task 2017-01-09T21:28:37.084287Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.086408Z] 21:28:37     INFO - #06: NS_InvokeByIndex
[task 2017-01-09T21:28:37.088160Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.091948Z] 21:28:37     INFO - [Child 2617] ###!!! ASSERTION: Unexpected document: 'capturingContent->GetComposedDoc() == GetDocument()', file /home/worker/workspace/build/src/layout/base/PresShell.cpp, line 7287
[task 2017-01-09T21:28:37.094391Z] 21:28:37     INFO - #01: nsViewManager::DispatchEvent [view/nsViewManager.cpp:815]
[task 2017-01-09T21:28:37.096816Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.101063Z] 21:28:37     INFO - #02: nsView::HandleEvent [view/nsView.cpp:1125]
[task 2017-01-09T21:28:37.103974Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.105984Z] 21:28:37     INFO - #03: mozilla::widget::PuppetWidget::DispatchEvent [widget/PuppetWidget.cpp:360]
[task 2017-01-09T21:28:37.108296Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.112801Z] 21:28:37     INFO - #04: nsContentUtils::SendMouseEvent [dom/base/nsContentUtils.cpp:8324]
[task 2017-01-09T21:28:37.114482Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.116619Z] 21:28:37     INFO - #05: nsDOMWindowUtils::SendMouseEventCommon [dom/base/nsDOMWindowUtils.cpp:718]
[task 2017-01-09T21:28:37.118422Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.121821Z] 21:28:37     INFO - #06: NS_InvokeByIndex
[task 2017-01-09T21:28:37.123663Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.126171Z] 21:28:37     INFO - [Child 2617] ###!!! ASSERTION: Unexpected document: 'capturingContent->GetComposedDoc() == GetDocument()', file /home/worker/workspace/build/src/layout/base/PresShell.cpp, line 7432
[task 2017-01-09T21:28:37.128716Z] 21:28:37     INFO - #01: nsViewManager::DispatchEvent [view/nsViewManager.cpp:815]
[task 2017-01-09T21:28:37.130643Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.132739Z] 21:28:37     INFO - #02: nsView::HandleEvent [view/nsView.cpp:1125]
[task 2017-01-09T21:28:37.134791Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.136990Z] 21:28:37     INFO - #03: mozilla::widget::PuppetWidget::DispatchEvent [widget/PuppetWidget.cpp:360]
[task 2017-01-09T21:28:37.140049Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.142039Z] 21:28:37     INFO - #04: nsContentUtils::SendMouseEvent [dom/base/nsContentUtils.cpp:8324]
[task 2017-01-09T21:28:37.145150Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.147283Z] 21:28:37     INFO - #05: nsDOMWindowUtils::SendMouseEventCommon [dom/base/nsDOMWindowUtils.cpp:718]
[task 2017-01-09T21:28:37.149279Z] 21:28:37     INFO - 
[task 2017-01-09T21:28:37.151629Z] 21:28:37     INFO - #06: NS_InvokeByIndex

I'm interpreting this as "the mouse events are not being synthesized on the right document". Does any of this help illuminate what might be going wrong here? Uh... ni? kats again, although I suspect he just got roped into this for being a reviewer back in the day. Sorry, yo! :)
Flags: needinfo?(mconley) → needinfo?(bugmail)
(In reply to Mike Conley (:mconley) - Backed up on review / needinfo's from comment #19)
> I notice that this test opens both a new tab and a new window... perhaps
> it's not waiting for the page to be interactive before running some step?

This might be the case. It does look like the failure only happens in the "new tab" case and not the "new window" case.

> I'm seeing the following non-fatal assertions in the log before the test
> fails:

I don't think these are related, they seem to happen much earlier in the test, and also show up in passing runs.

Taking a step back, the intent of the test is to start a mouse-drag text selection such that the mouse button goes down on the page, then the mouse is moved outside the window area. This causes the page to scroll and continue the selection, and that's what the test is checking for (it checks that the text is selected and that the scroll position has advanced). Most of these checks are passing, so I'm fairly confident the correct window has focus and that the selection is happening as intended. The only problem is that the page is scrolling farther than we'd expect it to. I'm going to do a try push with more logging to see if there's anything really wacky going on.
Based on the output at [1], I suspect what's happening is that the initial scroll position of the page is wrong. It seems to be scrolling by 20px per iteration, but it starts off in the neighborhood of 300px when it should be at 0 (or 20, since the first scroll already happened before the print occurs).

mconley, is it possible that your patch is painting the page before the scroll position is properly reset? The scroll position restoration code is pretty brittle so this wouldn't surprise me. I can add more logging to see if that's the case.

[1] https://treeherder.mozilla.org/logviewer.html#?job_id=68048560&repo=try&lineNumber=11225
I ended up reproducing this locally with rr and logging. The page does load correctly with a scroll position of 0. What seems to be happening is that sometime after [1], there is what appears to be a long composite (I think). The content process keeps getting refresh driver ticks, but they keep hitting the check at [2] and bailing out. I believe this happens if a paint has been sent to the compositor and the compositor is still digesting it. As a result, the main-thread doesn't do any more paints until the compositor is done. However, the selection autoscroll code keeps merrily ticking away and scrolling by 20 pixels every 30ms. Because the scroll event is only fired when a paint is done, these scroll offset changes fly under the radar of the listener in the test, and the listener only finds out much later, after the main thread finally does a repaint. At that point the scroll position could be pretty large.

I'm attaching a sample output from my local run - I instrumented the code at [2] to log (mMostRecentTick - aNowTime).ToMilliseconds() and you can see that it climbs up to 2489.85 before it finally does a paint and kicks the scroll listener in the test.

So what this boils down to doesn't seem to be a bug per se, it's just that the timing of the paint got shifted, and the test is not robust enough to handle that. I think making the test more robust against this would probably be the correct fix. Or just taking out the test assertion that's failing.

[1] http://searchfox.org/mozilla-central/rev/225ab0637ed51b8b3f9f4ee2f9c339a37a65b626/toolkit/content/tests/mochitest/test_mousecapture.xhtml#105
[2] http://searchfox.org/mozilla-central/rev/225ab0637ed51b8b3f9f4ee2f9c339a37a65b626/layout/base/nsRefreshDriver.cpp#1742
Flags: needinfo?(bugmail)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #22)
> mconley, is it possible that your patch is painting the page before the
> scroll position is properly reset? The scroll position restoration code is
> pretty brittle so this wouldn't surprise me. I can add more logging to see
> if that's the case.

Certainly possible. My patch makes it so that we can interrupt JS and force a paint for a newly created tab. If your test is expecting to have some script run before it has painted, it's possible the patch I landed it causes it to be pre-empted.
Added a reference to bug 1323319 which seems to have exacerbated this intermittent.
See Also: → 1323319
I've got an idea for a patch that might allow us to sidestep this (by waiting for assurance that content has painted once before running the scroll test).

Will push to try and report back.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #23)
> I'm attaching a sample output from my local run - I instrumented the code at
> [2] to log (mMostRecentTick - aNowTime).ToMilliseconds() and you can see
> that it climbs up to 2489.85 before it finally does a paint and kicks the
> scroll listener in the test.
> 
> So what this boils down to doesn't seem to be a bug per se, it's just that
> the timing of the paint got shifted, and the test is not robust enough to
> handle that. I think making the test more robust against this would probably
> be the correct fix. Or just taking out the test assertion that's failing.
> 

Can we determine that ~2500ms offset from the test script and use it to get a more accurate window of what the expected offset would be?

> [1]
> http://searchfox.org/mozilla-central/rev/
> 225ab0637ed51b8b3f9f4ee2f9c339a37a65b626/toolkit/content/tests/mochitest/
> test_mousecapture.xhtml#105
> [2]
> http://searchfox.org/mozilla-central/rev/
> 225ab0637ed51b8b3f9f4ee2f9c339a37a65b626/layout/base/nsRefreshDriver.cpp#1742
Flags: needinfo?(bugmail)
(In reply to :Gijs from comment #34)
> 
> Can we determine that ~2500ms offset from the test script and use it to get
> a more accurate window of what the expected offset would be?

Possibly, yes, but I think it would probably not be very robust. My suggestion is to just remove the failing assertion. There's no real need to put an upper bound on the scroll position in the first place. As long as the page scrolled (by a multiple of 20) we can be reasonably sure that the scroll-during-selection code is working.
Flags: needinfo?(bugmail)
Comment on attachment 8827968 [details]
Bug 1285176 - fix intermittent test_mousecapture.xhtml by removing a check whose success we can't guarantee,

https://reviewboard.mozilla.org/r/105532/#review106344

Thanks!
Attachment #8827968 - Flags: review?(bugmail) → review+
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/99e0fc974cce
fix intermittent test_mousecapture.xhtml by removing a check whose success we can't guarantee, r=kats
https://hg.mozilla.org/mozilla-central/rev/99e0fc974cce
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Comment on attachment 8827968 [details]
Bug 1285176 - fix intermittent test_mousecapture.xhtml by removing a check whose success we can't guarantee,

Approval Request Comment
[Feature/Bug causing the regression]:

Bug 1323319

[User impact if declined]:

None.

[Is this code covered by automated tests?]:

Yes.

[Has the fix been verified in Nightly?]:

Yes.

[Needs manual test from QE? If yes, steps to reproduce]: 

None.

[List of other uplifts needed for the feature/fix]:

None.

[Is the change risky?]:

No.

[Why is the change risky/not risky?]:

This addresses an intermittent test that was exacerbated by bug 1323319. We want to uplift this to keep the tree happy if bug 1323319 uplifts.

[String changes made/needed]:

None.
Attachment #8827968 - Flags: approval-mozilla-aurora?
Comment on attachment 8827968 [details]
Bug 1285176 - fix intermittent test_mousecapture.xhtml by removing a check whose success we can't guarantee,

Apparently this has already uplifted (comment 40)
Attachment #8827968 - Flags: approval-mozilla-aurora?
Whiteboard: [stockwell fixed]
You need to log in before you can comment on or make changes to this bug.