1720248 - Needs a better way to start mochitest which need to synthesize events via the main process

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Reporter

Description

•

3 years ago

Currently, synthesizing pointing device events requires APZ to have layout in the <iframe> in the test runner if:

"test.events.async.enabled" is set to true and synthesizing events.
synthesizing wheel events with synthesizeWheel in EventUtils.js.

If APZ does not have layout information of the subdocument yet, synthesized events are fired on the <iframe> element in the parent document. This causes timeouts especially in fission-xorigin tests.

For avoiding this issue, I added promiseElementReadyForUserInput into EventUtils.js, and make each test which may have this trouble use it before running its first test.

However, this makes harder to create mochitest for developers who are not familiar with this issue. So, we need better solution for this issue in the long term goal.

If it's not matter to wait cost in all mochitest which use SimpleTest.waitForFocus, doing this in it automatically is better. However, the cost and the performance in treeherder is important. So, I guess that it's not good solution for most tests which don't synthesize user input via the main process.

Hiroyuki Ikezoe (:hiro)

Comment 1

•

3 years ago

•

Edited

I was thinking it's the cause of bug 1695126 (test_sanityEventUtils.html failure), but hmm interestingly bug 1695126 has been marked as works-for-me...

Okay, found bug 1716411.

Comment 2

•

3 years ago

Andrew, do you think forcing SimpleTest.waitForFocus to wait for promiseElementReadyForUserInput for every mochitest would increase Treeherder test time too much or cause other problems?

Fission Milestone: --- → ?

Flags: needinfo?(ahal)

Andrew Halberstadt [:ahal]

Comment 3

•

3 years ago

I'll be honest, I have no idea :)

If there was a try push to compare I could maybe extract all the durations and do a comparison.

Flags: needinfo?(ahal)

Hiroyuki Ikezoe (:hiro)

Comment 4

•

3 years ago

Masayuki, can you please elaborate where/when we don't have the information in question?

the OOP iframe's document has been painted
APZ has received the OOP iframe document's information after 1)
the OOP iframe has received the information from APZ after 2)

If the test fails in between 2) and 3), I think we can defer event handling until we receive the information from APZ.

From a Botond's comment; https://phabricator.services.mozilla.com/D119596#3890712

In addition to waitUntilApzStable(), some APZ tests (e.g. this one perform waitUntilApzStable() and forceLayerTreeToCompositor()). I think the forceLayerTreeToCompositor() may be the important one in your case (if the issue is that the compositor doesn't know about the iframe document's layers yet).

The case Botond raised in this comment is, the test fails in between 1) and 2).

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Reporter

Comment 5

•

3 years ago

(In reply to Hiroyuki Ikezoe (:hiro) from comment #4)

Masayuki, can you please elaborate where/when we don't have the information in question?

the OOP iframe's document has been painted

APZ has received the OOP iframe document's information after 1)

the OOP iframe has received the information from APZ after 2)

Honestly, I'm not sure. The failure occurred only in opt builds, and if doing something output for debugging, the race was changed and couldn't investigate actually. So, I just tried various approach to log it (including crash itself in specific condition) and the detail of other things are not clear to me.

From a Botond's comment; https://phabricator.services.mozilla.com/D119596#3890712

In addition to waitUntilApzStable(), some APZ tests (e.g. this one perform waitUntilApzStable() and forceLayerTreeToCompositor()). I think the forceLayerTreeToCompositor() may be the important one in your case (if the issue is that the compositor doesn't know about the iframe document's layers yet).

The case Botond raised in this comment is, the test fails in between 1) and 2).

Oh, I forgot to put his comment here. Thank you for checking it. Honestly, I'm still not completely familiar with APZ design. So, I don't understand which one is necessary in the various things in forceLayerTreeToCompositior(). I don't like non-APZ test includes the not tiny JS utilities and run it even when it's not necessary for the test...

Hiroyuki Ikezoe (:hiro)

Comment 6

•

3 years ago

Okay, during working on bug 1716404, I've been seeing the race in between 2) and 3), I've never seen the case in between 1) and 2). In the former case we should fix our implementation instead of tweaking tests since users will be able to notice the case since the iframe document has been composited at the moment, which means it's visible.

Hiroyuki Ikezoe (:hiro)

Comment 7

•

3 years ago

(In reply to Hiroyuki Ikezoe (:hiro) from comment #6)

Okay, during working on bug 1716404, I've been seeing the race in between 2) and 3), I've never seen the case in between 1) and 2). In the former case we should fix our implementation instead of tweaking tests since users will be able to notice the case since the iframe document has been composited at the moment, which means it's visible.

Oh no, this is somewhat misleading. What I've been seeing in bug 1716404 is the race in between 1) and 3).

Chris Peterson [:cpeterson]

Comment 8

•

3 years ago

Masayuki or Hiro, will this SimpleTest.waitForFocus change fix tests that are currently failing with Fission? Or will is just make future tests more reliable?

In our Fission bug triage meeting, kmag suggested we go ahead and make this test change. Then we can watch to see how test times are impacted after it lands.

I will track this bug for Fission Milestone MVP for now.

Fission Milestone: ? → MVP

Flags: needinfo?(masayuki)

Flags: needinfo?(hikezoe.birchill)

Hiroyuki Ikezoe (:hiro)

Comment 9

•

3 years ago

(In reply to Chris Peterson [:cpeterson] from comment #8)

Masayuki or Hiro, will this SimpleTest.waitForFocus change fix tests that are currently failing with Fission? Or will is just make future tests more reliable?

I'd say it depends on what each failing test does, some of them will be fixed by the wait, root causes of some of failures will be wall-papered by the wait.

I would not suggest adding the wait without knowing each failure reason.

Flags: needinfo?(hikezoe.birchill)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Reporter

Comment 10

•

3 years ago

(In reply to Chris Peterson [:cpeterson] from comment #8)

Masayuki or Hiro, will this SimpleTest.waitForFocus change fix tests that are currently failing with Fission? Or will is just make future tests more reliable?

I intended for the latter. This issue shouldn't waste the time of developers who tries to write tests of some user inputs. But hiro said that he does not like the wall-paper...

Flags: needinfo?(masayuki)

Hiroyuki Ikezoe (:hiro)

Comment 11

•

3 years ago

To be clear, I am not opposed adding the wait, what I am objecting is adding the wait without knowing failure reasons, adding it innocently is a sort of magical setTimeout calls, I don't believe such a thing is a good idea.

Hiroyuki Ikezoe (:hiro)

Comment 12

•

3 years ago

Can someone summarize which tests are being affected by this issue? I'd be happy to take a look of each test, specifically where the underlying problem happens as I commented in comment 4. If the problem happens in between 2) and 3) I mentioned, I believe we should fix Gecko's implementation (Though I may be wrong , I am assuming those events in question will be asynchronously processed).

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Reporter

Comment 13

•

3 years ago

These tests, especially when running --enable-fission --enable-xorigin-tests.
https://searchfox.org/mozilla-central/search?q=promiseElementReadyForUserInput&path=&case=false&regexp=false

Hiroyuki Ikezoe (:hiro)

Comment 14

•

3 years ago

•

Edited

Thanks!

I did quick check one the them; test_deltaMode_lines_always_enabled.html. In the test I dropped the await promiseElementReadyForUserInput call and could catch the timeout easily. I also tried to reproduce the timeout with some printfs to see whether the timeout is caused by the issue I am aware of (event handling is processed before we received transform matix from APZ). But in fact, this synthesizeWheel calls after we received the transform matrix. The test is exactly one of the cases I am concerned.

Hiroyuki Ikezoe (:hiro)

Comment 15

•

3 years ago

•

Edited

(In reply to Hiroyuki Ikezoe (:hiro) from comment #14)

I did quick check one the them; test_deltaMode_lines_always_enabled.html. In the test I dropped the await promiseElementReadyForUserInput call and could catch the timeout easily. I also tried to reproduce the timeout with some printfs to see whether the timeout is caused by the issue I am aware of (event handling is processed before we received transform matix from APZ). But in fact, this synthesizeWheel calls after we received the transform matrix. The test is exactly one of the cases I am concerned.

This is a case without WebRender. I do often forget to specify --enable-webrender. :/ Anyway I am not going to dive into detail in this case, am going to note an interesting observation for records, for people who's interested in.

Here is a piece of hit testing tree when the test succeeds;

 HitTestingTreeNode (7ff649396bb0) APZC (7ff6397e8000) g=({ l=0x10000000a, p=1, v=2 }) r=({ }) t=([ 1 0; 0 1; 13 256; ]) c=([13,256,513,556])
   HitTestingTreeNode (7ff649395660) APZC (0) g=(l=0x10000000c) r=({ }) t=([ I ]) c=(none)
     HitTestingTreeNode (7ff649396200) APZC (0) g=(l=0x10000000c) r=({ Hit=[0,0,500,300] }) t=([ I ]) c=(none)
     HitTestingTreeNode (7ff649397f10) APZC (7ff622a2d000) g=({ l=0x10000000c, p=1, v=2 }) r=({ Hit=[0,0,500,956] }) t=([ I ]) c=([0,0,500,300])

l=0x10000000c is the OOP iframe's document in question and there are two hit regions (Hit=[]) properly. Whereas here is a piece of hit test tree when the test fails;

  HitTestingTreeNode (7ff649396bb0) APZC (7ff6397e8000) g=({ l=0x10000000a, p=1, v=2 }) r=({ }) t=([ 1 0; 0 1; 13 256; ]) c=([13,256,513,556]) 
     HitTestingTreeNode (7ff649397370) APZC (7ff622bd7000) g=({ l=0x10000000d, p=1, v=2 }) r=({ }) t=([ I ]) c=([0,0,500,300]

l=0x10000000d is the OOP iframe's document and there's no hit region. This makes me think that the test starts running before painting some contents in the document.

I've also seen that with WebRender in failure cases the test starts running before painting this #overflow element on the main-thread. I believe the #overflow element is an important part of the test, so I wonder the test, test_deltaMode_lines_always_enabled.html should be started after the #overflow element was painted. Masayuki?

Flags: needinfo?(masayuki)

Chris Peterson [:cpeterson]

Comment 16

•

3 years ago

(In reply to Hiroyuki Ikezoe (:hiro) from comment #11)

To be clear, I am not opposed adding the wait, what I am objecting is adding the wait without knowing failure reasons, adding it innocently is a sort of magical setTimeout calls, I don't believe such a thing is a good idea.

Hiro, so you recommend that tests that know they need to wait can add await promiseElementReadyForUserInput, but we shouldn't make SimpleTest.waitForFocus wait for all tests? In that case, should we close this bug as WONTFIX?

Either way, this bug probably doesn't need to be fixed for Fission's MVP milestone, so I will change Fission Milestone from MVP to Future.

Fission Milestone: MVP → Future

Flags: needinfo?(hikezoe.birchill)

Hiroyuki Ikezoe (:hiro)

Comment 17

•

3 years ago

(In reply to Chris Peterson [:cpeterson] from comment #16)

(In reply to Hiroyuki Ikezoe (:hiro) from comment #11)

To be clear, I am not opposed adding the wait, what I am objecting is adding the wait without knowing failure reasons, adding it innocently is a sort of magical setTimeout calls, I don't believe such a thing is a good idea.

Hiro, so you recommend that tests that know they need to wait can add await promiseElementReadyForUserInput, but we shouldn't make SimpleTest.waitForFocus wait for all tests? In that case, should we close this bug as WONTFIX?

No, what I am saying is that we should understand/investigate each failure reason in the first place before doing anything which will be wall-papering the underlying causes.

Either way, this bug probably doesn't need to be fixed for Fission's MVP milestone, so I will change Fission Milestone from MVP to Future.

It depends. If there's an underlying issue which will be able to be observed in the real world and it's not very rare, it should be MVP at least.

Flags: needinfo?(hikezoe.birchill)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Reporter

Comment 18

•

3 years ago

(In reply to Hiroyuki Ikezoe (:hiro) from comment #15)

(In reply to Hiroyuki Ikezoe (:hiro) from comment #14)
I believe the #overflow element is an important part of the test,

I don't think so, it can be testale with any elements.

so I wonder the test, test_deltaMode_lines_always_enabled.html should be started after the #overflow element was painted. Masayuki?

Exactly, it requires a synthesized event coming from the main process.

(And I'd like there are some document for writing tests with APZ/WebRender in async dispatching event mode.)

Flags: needinfo?(masayuki)

Hiroyuki Ikezoe (:hiro)

Comment 19

•

3 years ago

•

Edited

Note that I've also confirmed that the test started before painting the body element. So what code ensures the paint? It doesn't seems that SimpleTest.promiseFocus ensures it.

Here is a document about APZ https://firefox-source-docs.mozilla.org/gfx/AsyncPanZoom.html, though I know it's not perfect.

Anyways, at least the failure of test_deltaMode_lines_always_enabled.html I've seen is not directly related to APZ, if the content in question hasn't yet been painted, APZ can not tell us the exact position where the event is happening.

Hiroyuki Ikezoe (:hiro)

Comment 20

•

3 years ago

(In reply to Hiroyuki Ikezoe (:hiro) from comment #4)

the OOP iframe's document has been painted

APZ has received the OOP iframe document's information after 1)

the OOP iframe has received the information from APZ after 2)

If the test fails in between 2) and 3), I think we can defer event handling until we receive the information from APZ.

After more digging into detail, I noticed this failure condition between 2) and 3) will never happen. That's because the coordinate conversion is processed in the parent process, so once after 2) has finished the conversion should just work.

Anyways, what I've concluded is all the tests in comment 13 start running before painting any elements in the iframe document (e.g. the root document) in the failure case. With await promiseAllPaintsDone if tehre's pending paint the tests work as expected, here is a try run; https://treeherder.mozilla.org/jobs?repo=try&revision=0759b6424f4032e1039e3f64672417fe3be4263e . Though there are a couple of test_coalesce_mousewheel.html failures, I suppose they are different issues (bug 1717656, bug 1714285, etc.)

Though I am not familiar with event handling stuff (specs), processing event handling for a document which hasn't yet started painting is quite unsound, I mean it's an expected behavior that event handling should just fail in such documents, isn't it?

I am quite unsure that focusing an element ensures painting the element or not, if it should ensure (as per specs), it's an issue on focus handling, if it's not, it's an issue in our test harness (especially in EventUtils.js?).

Masayuki, any insights?

Flags: needinfo?(masayuki)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Reporter

Comment 21

•

3 years ago

(In reply to Hiroyuki Ikezoe (:hiro) from comment #20)

(In reply to Hiroyuki Ikezoe (:hiro) from comment #4)
Though I am not familiar with event handling stuff (specs), processing event handling for a document which hasn't yet started painting is quite unsound, I mean it's an expected behavior that event handling should just fail in such documents, isn't it?

Yes, it is. Responding to unpainted element's operation may cause serious issue. E.g., unexpectedly submitting a form.

I am quite unsure that focusing an element ensures painting the element or not, if it should ensure (as per specs), it's an issue on focus handling, if it's not, it's an issue in our test harness (especially in EventUtils.js?).

As far as I know, there is no spec about focus handling, but you must point an important issue. If we started to handle user inputs whose target is focused document/element like keyboard before first paint, it might cause unexpected result for the user. So, we need to check this behavior anyway. Henri, do you know about this (you touched focus handling of Fission world).

If waiting focus on a document guarantees first paint, it may be useful in WPT too. (Bug 1601176 might be caused by this issue...)

Flags: needinfo?(masayuki) → needinfo?(hsivonen)

Hiroyuki Ikezoe (:hiro)

Comment 22

•

3 years ago

Thank you, Masayuki. I should add a note that, to be precise, as of now, what each test does is just waiting for focusing on the iframe's window, so it's not a good manner even if focusing something should ensure painting the something, I don't think focusing window ensures painting the window's content.

Henri Sivonen (:hsivonen)

Comment 23

•

3 years ago

(In reply to Masayuki Nakano [:masayuki] (he/him)(JST, +0900) from comment #21)

As far as I know, there is no spec about focus handling, but you must point an important issue. If we started to handle user inputs whose target is focused document/element like keyboard before first paint, it might cause unexpected result for the user. So, we need to check this behavior anyway. Henri, do you know about this (you touched focus handling of Fission world).

IIRC, the initial focus of a framed document happens from the refresh driver. I believe focus itself doesn't trigger painting, but I'm not sure.

Flags: needinfo?(hsivonen)

Hiroyuki Ikezoe (:hiro)

Comment 24

•

3 years ago

I realized that there's an open intermittent failure bug for test_sanityEventUtils.html on Android (bug1704064) it makes me think that this race is not Fission specific (thus it's not APZ specific).

Updated

•

3 years ago

Updated

•

3 years ago