Closed Bug 1517414 Opened 1 year ago Closed 1 year ago

Re-evalute usage of TimedPromise to avoid various timeout errors

Categories

(Testing :: Marionette, defect, P2)

defect

Tracking

(firefox-esr60 unaffected, firefox64 unaffected, firefox65 fixed, firefox66 fixed)

RESOLVED FIXED
mozilla66
Tracking Status
firefox-esr60 --- unaffected
firefox64 --- unaffected
firefox65 --- fixed
firefox66 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

(Depends on 1 open bug)

Details

(Keywords: regression)

For a while the TimedPromise is used to restrict a couple of Gecko API calls to a certain amount of time. While this sound pretty helpful previously, the given amount of recent test failures caused me to think differently. Now I feel this makes Marionette very unstable and unreliable for a couple of commands, which are mostly window manipulation related. And we actually tried to fix that with bug 1492499.

The underlying problem with using short timeout values is that it's unknown for us how Firefox behaves in various environments, and especially for different builds (opt, debug, asan, valgrand). While we tried to solve it with a multiplicator for debug builds [1] we cannot estimate how those builds behave on eg. slow systems. Those could still cause Firefox to take way longer to finish the API call.

As such we should really rethink the usage of the TimedPromise, and I would vote for removing it in all the places, and instead implement a timer to kill long running WebDriver commands, which usually caused an infinite hang in the past.

I know that this can become problematic for navigation, executing scripts, and finding elements, because those commands use a user-configurable timeout which would have to be obeyed, or by just excluding those commands from that timer.

The result of that would be that we don't care how much time individual API calls take, but make sure that we don't run in an infinite hang. As such we can kill lots of those frequent intermittent failures which are visible those days.

Andreas, I would really like to discuss that with you. As best via Vidyo. Please let me know when you have the time.

[1] https://searchfox.org/mozilla-central/rev/0ee0b63732d35d16ba22d5a1120622e2e8d58c29/testing/marionette/sync.js#31
This is hard to discuss on a general basis.

There probably are cases where we use TimedPromise where it hides
actual problems.  This is in party why we emit warnings, so that
we are aware of when it happens.  These warnings can be useful in
narrowing down which scenarios TimedPromise is useful, and when it
is a hammer, essentially used for a precision-tool job.

The most likely use case for TimedPromise is when manipulating the
window may, under some circumstances, not give us a definitive
answer.  For example we can’t tell on Linux without a WM whether
the window becomes maximised, because there is no WM to emit the
right X11 signals to Firefox.  We also can’t detect that there’s
_no WM_, which leaves us with a conundrum.

The best thing we can do in this case is try and see if it works.
Likely the window will get maximised to the fullest extent of the
screen, but since we have no events to tell us this we are forced
to abort the wait for an event after some arbitrary duration we
have decided on.

We can’t apply a timeout for every command, as things like
WebDriver:ExecuteScript with a session script timeout of null is
expected to run for infinity.  But it would probably be a good thing
to have for everything else so it’s possible we can guard a general
timeout on a subset of commands that require this behaviour.  You
also mention a few other subclasses of commands that would be
affected by this.

I would like to hear your thoughts on what we could do for those
window manipulation cases where we either aren’t getting, or can’t
rely on, events.  Waiting for a (longer) overall command timeout
is not an option as these commands are called frequently in
environments without WMs or enviroments we can’t trust to give us
the right events.
No longer blocks: 1504756

Lets see what additional logging for the use cases we have for window manipulation commands shows us. Hopefully I will get to bug 1510940 soon.

Depends on: 1510940
No longer blocks: 1515867

We discussed this topic in the WebDriver meeting yesterday. The minutes can be found here:
https://wiki.mozilla.org/WebDriver/Meetings/2019-01-21#Minutes

As agreed on we will keep the usage of the TimedPromise, but have raised the timeout for all the window manipulation commands to 5000ms. Also none of them should raise an actual timeout error, given that we actually should do our best to reach the window state, but if it is not possible we shouldn't fail. All this work was done on bug 1521527.

Assignee: nobody → hskupin
Status: NEW → RESOLVED
Closed: 1 year ago
Depends on: 1521527
Priority: -- → P2
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
No longer blocks: 1516975
No longer blocks: 1510305
You need to log in before you can comment on or make changes to this bug.