Closed Bug 1685004 Opened 3 years ago Closed 3 years ago

Intermittent gfx/layers/apz/test/mochitest/test_group_touchevents-2.html | helper_long_tap.html | Got a mouselongtap - got "mouseover", expected "mouselongtap"

Categories

(Core :: Panning and Zooming, defect, P5)

defect

Tracking

()

RESOLVED FIXED
88 Branch
Tracking Status
firefox88 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: botond)

Details

(Keywords: intermittent-failure, Whiteboard: [retriggered])

Attachments

(1 file, 1 obsolete file)

Filed by: nbeleuzu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=325798909&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/UyVBOzJeQUuVm4LyI9vR6g/runs/0/artifacts/public/logs/live_backing.log


[task 2021-01-05T00:10:27.152Z] 00:10:27     INFO - TEST-PASS | gfx/layers/apz/test/mochitest/test_group_touchevents-2.html | helper_long_tap.html | Got a touchstart 
[task 2021-01-05T00:10:27.152Z] 00:10:27     INFO - Buffered messages finished
[task 2021-01-05T00:10:27.152Z] 00:10:27     INFO - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_group_touchevents-2.html | helper_long_tap.html | Got a mouselongtap - got "mouseover", expected "mouselongtap"
[task 2021-01-05T00:10:27.152Z] 00:10:27     INFO -     SimpleTest.is@SimpleTest/SimpleTest.js:500:14
[task 2021-01-05T00:10:27.152Z] 00:10:27     INFO -     spawnTest/w.is@gfx/layers/apz/test/mochitest/apz_test_utils.js:440:18
[task 2021-01-05T00:10:27.152Z] 00:10:27     INFO -     recordEvent@gfx/layers/apz/test/mochitest/helper_long_tap.html:46:11
[task 2021-01-05T00:10:27.152Z] 00:10:27     INFO - Not taking screenshot here: see the one that was previously logged
[task 2021-01-05T00:10:27.153Z] 00:10:27     INFO - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_group_touchevents-2.html | helper_long_tap.html | Got a touchend - got "mouseenter", expected "touchend"
[task 2021-01-05T00:10:27.153Z] 00:10:27     INFO -     SimpleTest.is@SimpleTest/SimpleTest.js:500:14
[task 2021-01-05T00:10:27.153Z] 00:10:27     INFO -     spawnTest/w.is@gfx/layers/apz/test/mochitest/apz_test_utils.js:440:18
[task 2021-01-05T00:10:27.153Z] 00:10:27     INFO -     recordEvent@gfx/layers/apz/test/mochitest/helper_long_tap.html:49:17
[task 2021-01-05T00:10:27.153Z] 00:10:27     INFO - Not taking screenshot here: see the one that was previously logged
[task 2021-01-05T00:10:27.153Z] 00:10:27     INFO - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_group_touchevents-2.html | helper_long_tap.html | Got a mouseover - got "touchend", expected "mouseover"
[task 2021-01-05T00:10:27.153Z] 00:10:27     INFO -     SimpleTest.is@SimpleTest/SimpleTest.js:500:14
[task 2021-01-05T00:10:27.153Z] 00:10:27     INFO -     spawnTest/w.is@gfx/layers/apz/test/mochitest/apz_test_utils.js:440:18
[task 2021-01-05T00:10:27.153Z] 00:10:27     INFO -     recordEvent@gfx/layers/apz/test/mochitest/helper_long_tap.html:50:17
[task 2021-01-05T00:10:27.153Z] 00:10:27     INFO - GECKO(4512) | [Child 5792, Main Thread] WARNING: NS_ENSURE_TRUE(info) failed: file /builds/worker/checkouts/gecko/extensions/permissions/PermissionDelegateHandler.cpp:348
[task 2021-01-05T00:10:27.153Z] 00:10:27     INFO - GECKO(4512) | [Child 5792, Main Thread] WARNING: NS_ENSURE_TRUE(info) failed: file /builds/worker/checkouts/gecko/extensions/permissions/PermissionDelegateHandler.cpp:348
[task 2021-01-05T00:10:27.154Z] 00:10:27     INFO - Not taking screenshot here: see the one that was previously logged
[task 2021-01-05T00:10:27.154Z] 00:10:27     INFO - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_group_touchevents-2.html | helper_long_tap.html | Got a mouseenter - got "mousemove", expected "mouseenter"
[task 2021-01-05T00:10:27.154Z] 00:10:27     INFO -     SimpleTest.is@SimpleTest/SimpleTest.js:500:14
[task 2021-01-05T00:10:27.154Z] 00:10:27     INFO -     spawnTest/w.is@gfx/layers/apz/test/mochitest/apz_test_utils.js:440:18
[task 2021-01-05T00:10:27.154Z] 00:10:27     INFO -     recordEvent@gfx/layers/apz/test/mochitest/helper_long_tap.html:51:17
[task 2021-01-05T00:10:27.154Z] 00:10:27     INFO - GECKO(4512) | helper_long_tap.html | got MozAfterPaint: 0,0,0,0
[task 2021-01-05T00:10:27.154Z] 00:10:27     INFO - GECKO(4512) | helper_long_tap.html | Dispatching 0 onpaint listeners
[task 2021-01-05T00:10:27.154Z] 00:10:27     INFO - Not taking screenshot here: see the one that was previously logged
[task 2021-01-05T00:10:27.154Z] 00:10:27     INFO - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_group_touchevents-2.html | helper_long_tap.html | Got a mousemove - got "mouseout", expected "mousemove"
[task 2021-01-05T00:10:27.154Z] 00:10:27     INFO -     SimpleTest.is@SimpleTest/SimpleTest.js:500:14
[task 2021-01-05T00:10:27.155Z] 00:10:27     INFO -     spawnTest/w.is@gfx/layers/apz/test/mochitest/apz_test_utils.js:440:18
[task 2021-01-05T00:10:27.155Z] 00:10:27     INFO -     recordEvent@gfx/layers/apz/test/mochitest/helper_long_tap.html:52:17
[task 2021-01-05T00:10:27.155Z] 00:10:27     INFO - Not taking screenshot here: see the one that was previously logged
[task 2021-01-05T00:10:27.155Z] 00:10:27     INFO - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_group_touchevents-2.html | helper_long_tap.html | Got a contextmenu - got "mouseleave", expected "contextmenu"
[task 2021-01-05T00:10:27.155Z] 00:10:27     INFO -     SimpleTest.is@SimpleTest/SimpleTest.js:500:14
[task 2021-01-05T00:10:27.155Z] 00:10:27     INFO -     spawnTest/w.is@gfx/layers/apz/test/mochitest/apz_test_utils.js:440:18
[task 2021-01-05T00:10:27.155Z] 00:10:27     INFO -     recordEvent@gfx/layers/apz/test/mochitest/helper_long_tap.html:53:17```

Hi Steven, can the recent failures here be from bug 1677774 ?

Flags: needinfo?(smacleod)
Whiteboard: [retriggered][stockwell needswork:owner]

Update:

There have been 43 failures within the last 7 days:

  • 24 failures on Windows 10 x64 asan
  • 19 failures on Windows 7 debug

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=329134595&repo=autoland&lineNumber=18009

[task 2021-02-07T07:08:31.143Z] 07:08:31 INFO - TEST-PASS | gfx/layers/apz/test/mochitest/test_group_touchevents-2.html | helper_long_tap.html | Got a touchstart
[task 2021-02-07T07:08:31.143Z] 07:08:31 INFO - Buffered messages finished
[task 2021-02-07T07:08:31.143Z] 07:08:31 INFO - TEST-UNEXPECTED-FAIL | gfx/layers/apz/test/mochitest/test_group_touchevents-2.html | helper_long_tap.html | Got a mouselongtap - got "mouseover", expected "mouselongtap"
[task 2021-02-07T07:08:31.143Z] 07:08:31 INFO - SimpleTest.is@SimpleTest/SimpleTest.js:500:14
[task 2021-02-07T07:08:31.143Z] 07:08:31 INFO - spawnTest/w.is@gfx/layers/apz/test/mochitest/apz_test_utils.js:423:18
[task 2021-02-07T07:08:31.143Z] 07:08:31 INFO - recordEvent@gfx/layers/apz/test/mochitest/helper_long_tap.html:46:11
[task 2021-02-07T07:08:31.144Z] 07:08:31 INFO - Not taking screenshot here: see the one that was previously logged

(In reply to Andreea Pavel [:apavel] from comment #2)

Hi Steven, can the recent failures here be from bug 1677774 ?

I don't think so, unless this used to be an intermittent crash? The most recent changes from bug 1677774 shouldn't change behaviour other than continuing where we used to crash.

Flags: needinfo?(smacleod)

@Steven, thank you for checking.
@Botond, can you assign someone to take a look?

Flags: needinfo?(botond)

Sure, I will take a look.

Assignee: nobody → botond
Flags: needinfo?(botond)

Update:

There have been 48 failures within the last 7 days:

  • 22 failures on Windows 7 debug
  • 26 failures on Windows 10 x64 asan

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=330561739&repo=autoland&lineNumber=11643

It looks like the test is expecting a very specific sequence on events during a long touch-tap, which (on Windows) is:

  1. (test code sends TOUCH_CONTACT)
  2. touchstart
  3. mouselongtap (test code now sends TOUCH_REMOVE)
  4. touchend
  5. mouseover
  6. mouseenter
  7. mousemove
  8. contextmenu

When an intermittent failure occurs, it instead receives this sequence:

  1. (test code sends TOUCH_CONTACT)
  2. touchstart
  3. mouseover (test code sends TOUCH_REMOVE anyways)
  4. mouseenter
  5. touchend
  6. mousemove
  7. mouseout
  8. mouseleave

(It may still receive a contextmenu after all this, but it stops listening at this point so it doesn't show in the logs.)

Masayuki, are you aware of any recent change (perhaps around early January, when this intermittent started) related to mouse event handling which could occasionally cause this modified sequence of events?

Flags: needinfo?(masayuki)

I checked review requests around early January, but I have no idea. edgar, do you?

Flags: needinfo?(masayuki) → needinfo?(echen)

After taking another look, I don't think there's anything fishy about the mouse event sequence. "mouseout" and "mouseleave" look like the counterparts to "mouseover" and "mouseenter" which we would likely also see in the success case if we didn't stop listening.

The salient difference between the sequences in the success vs. failure cases is the absence of the "mouselongtap". Some logging I've added suggests APZ is sending a single-tap rather than a long-tap, suggesting the issue is related to something like a stray touchend event, possibly from the previous test.

Flags: needinfo?(echen)

(In reply to Botond Ballo [:botond] from comment #13)

suggesting the issue is related to something like a stray touchend event, possibly from the previous test.

Further investigation suggests the touchend event resulting in a single-tap isn't a stray one from the previous test, but the one synthesized by the test itself.

The test asserts that the second event it receives is a mouselongtap, but then synthesizes a touchend regardless.

If it receives an unexpected non-mouselongtap as the second event -- such as a mouseover in this case -- it synthesizes that touchend prematurely, and turns the intended long-tap into a single-tap.

So, the issue seems to be getting a stray / unexpected mouseover. Not sure why that might be happening, but I'm also not sure that it's important for the purpose of the test. We might be able to just ignore it.

Couple of updates here:

  • The problematic mouseover is not an additional one, but rather the one we expect arrives earlier.
  • I'm not exactly sure why; I'm seeing calls to NotifyMouseOver() with aMouseEvent->mMessage being eMouseMove, as well as ePointerGotCapture just before the mouseover arrives. Note, however, that this is before an eMouseMove is generated by FireContextMenuEvents().

I think I should be able to work around this in the test, but it remains to be seen whether the workaround is a satisfactory solution.

(In reply to Botond Ballo [:botond] from comment #16)

as well as ePointerGotCapture just before the mouseover arrives.

This reminds me that I enabled implicit pointer capture for touch event around January, see bug 1682527, could you try whether setting dom.w3c_pointer_events.implicit_capture to false help, though I am no idea why it affects mouseover event.

(In reply to Edgar Chen [:edgar] from comment #17)

This reminds me that I enabled implicit pointer capture for touch event around January, see bug 1682527, could you try whether setting dom.w3c_pointer_events.implicit_capture to false help, though I am no idea why it affects mouseover event.

Yes, setting this pref to false does seem to resolve the intermittent failure.

(In reply to Botond Ballo [:botond] from comment #18)

Yes, setting this pref to false does seem to resolve the intermittent failure.

Thanks! So it seems like the ePointerGotCapture somehow triggers the mouseover event. I will take a look.

Botond/Edgar this has 180 total failures in the last 30 days. Should it get disabled until there is a fix or leave it enabled?

Please give me few more days, I am actively debugging this. :-)

(In reply to Edgar Chen [:edgar] from comment #21)

Please give me few more days, I am actively debugging this. :-)

Sure, thanks for working on this!

(In reply to Edgar Chen [:edgar] from comment #19)

So it seems like the ePointerGotCapture somehow triggers the mouseover event. I will take a look.

Err, mouseover isn't from the ePointerGotCapture, it is from a synthesized mousemove event.

Edgar, I posted a patch which works around the failure.

From an APZ point of view, I think the test is still valid with this change. However, I will leave it up to you if you find it a satisfying fix, or would like to investigate the underlying issue further. (In the latter case, we could still consider landing this patch as a temporary workaround.)

Thanks, Botond. I was worried about if bug 1682527 causes any regression, but it seems not the case.

The synthesized mousemove is scheduled for layout reflow, I guess there is some timing change after bug 1682527, so make the synthesized mousemove from reflow could possibly happen before the mouselongtap.

Another option is to disable synthesizing mousemove from reflow by setting layout.reflow.synthMouseMove to false to ensure mouseover is generated by the synthesized mousemove for context menu.

Attachment #9205263 - Attachment is obsolete: true

(In reply to Edgar Chen [:edgar] from comment #26)

Another option is to disable synthesizing mousemove from reflow by setting layout.reflow.synthMouseMove to false to ensure mouseover is generated by the synthesized mousemove for context menu.

Thanks, let's go with this approach as it's less invasive and allows the test to continue to make more precise assertions.

Here is a Try push showing that it resolves the intermittent.

Pushed by bballo@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5d06025b04a5
Disable layout.reflow.synthMouseMove for helper_long_tap.html. r=edgar
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 88 Branch
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---
Whiteboard: [retriggered][stockwell needswork:owner] → [retriggered][stockwell disabled]
Target Milestone: 88 Branch → ---

The patch that landed did not disable the test, it fixed the intermittent by setting the pref layout.reflow.synthMouseMove=false, which does not affect the validity of the test.

So, unless I'm misunderstanding the reason this was reopened, I think this can be closed.

Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED

(In reply to Botond Ballo [:botond] from comment #32)

The patch that landed did not disable the test, it fixed the intermittent by setting the pref layout.reflow.synthMouseMove=false, which does not affect the validity of the test.

So, unless I'm misunderstanding the reason this was reopened, I think this can be closed.

Ah, thank you for clarifying.

Whiteboard: [retriggered][stockwell disabled] → [retriggered]
Target Milestone: --- → 88 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: