Closed Bug 1361983 Opened 7 years ago Closed 7 years ago

Intermittent test_click.py TestClick.test_click_number_link | NoSuchElementException: Unable to locate element: username (pageunload timer canceled because flushEventLoop returns after beforeunload event)

Categories

(Testing :: Marionette Client and Harness, defect)

Version 3
defect
Not set
normal

Tracking

(firefox54 fixed, firefox55 fixed)

RESOLVED FIXED
mozilla55
Tracking Status
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: whimboo)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

The fix on bug 1357634 may miss something. So this failure happens because we cancel the page unload timer too early. And that even after 200ms whereby we use an extended timeout of 5000ms:

1493878218361	Marionette	TRACE	374 -> [0,4,"clickElement",{"id":"e31a38ae-c0ef-4939-bf22-3ca7b52c247b"}]
1493878219108	Marionette	DEBUG	Received DOM event "beforeunload" for "http://127.0.0.1:38163/clicks.html"
1493878219318	Marionette	DEBUG	Canceled page load listener because no navigation has been detected
1493878219396	Marionette	TRACE	374 <- [1,4,null,{}]
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Summary: Intermittent test_click.py TestClick.test_click_number_link | NoSuchElementException: Unable to locate element: username → Intermittent test_click.py TestClick.test_click_number_link | NoSuchElementException: Unable to locate element: username (pageunload timer canceled too early)
Blocks: 1335778
So while deeply inspecting this code I can only see a single code path which could trigger this problem:

We setup the page unload timer right after the trigger callback. But the event handlers are registered before the trigger callback is executed. In the case of `clickElement` there is the `flushEventLoop` call. It currently escapes for the next animation frame, or for the `beforeunload` event. So if the latter is happening, we haven't setup the page unload timer yet, and as such the if condition in `handleEvent` will not re-initialize the timer. But it got set by our code in `navigate`, right after the trigger callback returns. It means we have an overlap here.

Sadly nothing on try reproduces this problem yet, so it got unnoticed and my above idea is just an assumption:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a7b55c37aa39f9960a6783dcab7287633907e403
The try jobs finally hit a single instance of this problem. The signature of the failure is not identical but caused by the same underlying issue:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a7b55c37aa39f9960a6783dcab7287633907e403&selectedJob=96941953

The gecko.log exactly shows what is wrong:

1494001827731	Marionette	TRACE	447 -> [0,20,"clickElement",{"id":"13ec7301-33e4-4284-a5dd-d78d9c6bbf12"}]
1494001828035	Marionette	DEBUG	Received DOM event "beforeunload" for "http://127.0.0.1:49365/clicks.html"
1494001828035	Marionette	DEBUG	Having a page unload timer: null
1494001828045	Marionette	DEBUG	Set page unload timer to 200ms
1494001828246	Marionette	DEBUG	Canceled page load listener because no navigation has been detected
1494001828368	Marionette	TRACE	447 <- [1,20,null,{}]

It means my assumption of that a late return from `flushEventLoop` is causing this issue, was correct.

It's not that clear to me right now how to fix it. The easiest would be to have a general page unload timer, which stops the navigation after the 5s and no page unload being detected. But I'm not sure if we can/should do it for commands like `get`, `back`, `forward`, or `refresh`. Those haven't opted-in for the page unload timer.

David, what do you think? Is that a possibility, or should we better find another solution?
Flags: needinfo?(dburns)
Thanks to Dave who provided a minimized testcase (attachment 8863819 [details]) on bug 1335778, I can now perfectly replicate this issue locally.

Testing a bit I now know how to fix it. It's actually obvious when looking at the current code.
Flags: needinfo?(dburns)
The attached patch fixes the issue, but I would like to have a test for this problem. Not sure yet if I can easily minimize the crashstats example, but lets see. I should have the patch ready by Monday.
Blocks: 1362744
Blocks: 1362737
So it looks like that using an onclick event handler on a select/option element, which causes a page load can trigger such a behavior consistently. That's what I got minimized from the crash-stats example. With it I can reproduce the failure 100% of the time, and with my patch attached it just works.

Here some timing:

1494238546361	Marionette	TRACE	6 -> [0,9,"clickElement",{"id":"7c13c82b-b46e-5b46-8dc8-04a4c76aa5f5"}]
1494238546619	Marionette	DEBUG	Received DOM event "mousedown" for "[object HTMLSelectElement]"
1494238546633	Marionette	DEBUG	Received DOM event "mouseup" for "[object HTMLSelectElement]"
1494238546634	Marionette	DEBUG	Received DOM event "beforeunload" for "http://127.0.0.1:61690/clicks.html"
1494238546645	Marionette	DEBUG	Received DOM event "click" for "[object HTMLSelectElement]"
1494238546723	Marionette	DEBUG	Received DOM event "mouseout" for "[object HTMLBodyElement]"
1494238547663	Marionette	DEBUG	Received DOM event "pagehide" for "http://127.0.0.1:61690/clicks.html"
1494238547665	Marionette	DEBUG	Received DOM event "unload" for "http://127.0.0.1:61690/clicks.html"

So the page load already gets started with the `mouseup` event, while the `click` event hasn't been propagated yet.
Attachment #8865009 - Flags: review?(ato)
Summary: Intermittent test_click.py TestClick.test_click_number_link | NoSuchElementException: Unable to locate element: username (pageunload timer canceled too early) → Intermittent test_click.py TestClick.test_click_number_link | NoSuchElementException: Unable to locate element: username (pageunload timer canceled because flushEventLoop returns after beforeunload event)
Comment on attachment 8865009 [details]
Bug 1361983 - Extend page unload timer if flushEventLoop returns after beforeunload event.

https://reviewboard.mozilla.org/r/136666/#review140032

::: commit-message-7c902:8
(Diff revision 3)
> +In cases when the navigation trigger function returns too late and the
> +beforeunload event has already been fired, the page load currently gets
> +canceled. This happens because the page unload timer hasn't been set at
> +this time and the unbeforeunload handler doesn't extend its time.
> +
> +With this patch a flag is used which indicates if an unbeforeunload

unbeforeunload → beforeunload?

::: testing/marionette/harness/marionette_harness/tests/unit/test_click.py:340
(Diff revision 3)
> +    def test_click_option_navigate(self):
> +        self.marionette.find_element(By.ID, 'option').click()
> +        self.marionette.find_element(By.ID, 'delay')

Stay consistent with existing code style in the file by using double quotes.

::: testing/marionette/listener.js:146
(Diff revision 3)
> +    this.seenBeforeUnload = false;
>      this.seenUnload = false;

You could use a Set here for seen events:

> this.seen = new Set();
> 
> …
> 
> this.seen.add(event.type);
> 
> …
> 
> // Set#delete returns true if item existed
> if (this.seen.delete("beforeunload")) {
>   this.timerPageUnload.initWithCallback(…);
>   …
> }

This would save you from having to clear the seen events manually, but it’s also fine to do what you do here.

::: testing/marionette/listener.js:289
(Diff revision 3)
> +        // If no page unload has been detected, ensure to properly stop the load
> +        // listener, and return from the currently active command.
> +        } else if (!this.seenUnload) {
>            logger.debug("Canceled page load listener because no navigation " +
>                "has been detected");
>            this.stop();
>            sendOk(this.command_id);
>          }

this.seenUnload is not reset like this.seenBeforeUnload?
Attachment #8865009 - Flags: review?(ato) → review+
Comment on attachment 8865009 [details]
Bug 1361983 - Extend page unload timer if flushEventLoop returns after beforeunload event.

https://reviewboard.mozilla.org/r/136666/#review140032

> You could use a Set here for seen events:
> 
> > this.seen = new Set();
> > 
> > …
> > 
> > this.seen.add(event.type);
> > 
> > …
> > 
> > // Set#delete returns true if item existed
> > if (this.seen.delete("beforeunload")) {
> >   this.timerPageUnload.initWithCallback(…);
> >   …
> > }
> 
> This would save you from having to clear the seen events manually, but it’s also fine to do what you do here.

I will keep this in mind. If we have to make another change for that code we can get this implemented. Right now I would leave it as is.

> this.seenUnload is not reset like this.seenBeforeUnload?

`this.seenBeforeUnload` is only reset so that we do not run into the same if block again. If we have processed it once, we don't need it anymore. Anything else gets reset when `loadListener.start()` is called.
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7761907c8b70
Extend page unload timer if flushEventLoop returns after beforeunload event. r=ato
https://hg.mozilla.org/mozilla-central/rev/7761907c8b70
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Product: Testing → Remote Protocol
Moving bug to Testing::Marionette Client and Harness component per bug 1815831.
Component: Marionette → Marionette Client and Harness
Product: Remote Protocol → Testing
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: