Closed Bug 1517414 Opened 3 years ago Closed 3 years ago
Re-evalute usage of Timed
Promise to avoid various timeout errors
For a while the TimedPromise is used to restrict a couple of Gecko API calls to a certain amount of time. While this sound pretty helpful previously, the given amount of recent test failures caused me to think differently. Now I feel this makes Marionette very unstable and unreliable for a couple of commands, which are mostly window manipulation related. And we actually tried to fix that with bug 1492499. The underlying problem with using short timeout values is that it's unknown for us how Firefox behaves in various environments, and especially for different builds (opt, debug, asan, valgrand). While we tried to solve it with a multiplicator for debug builds  we cannot estimate how those builds behave on eg. slow systems. Those could still cause Firefox to take way longer to finish the API call. As such we should really rethink the usage of the TimedPromise, and I would vote for removing it in all the places, and instead implement a timer to kill long running WebDriver commands, which usually caused an infinite hang in the past. I know that this can become problematic for navigation, executing scripts, and finding elements, because those commands use a user-configurable timeout which would have to be obeyed, or by just excluding those commands from that timer. The result of that would be that we don't care how much time individual API calls take, but make sure that we don't run in an infinite hang. As such we can kill lots of those frequent intermittent failures which are visible those days. Andreas, I would really like to discuss that with you. As best via Vidyo. Please let me know when you have the time.  https://searchfox.org/mozilla-central/rev/0ee0b63732d35d16ba22d5a1120622e2e8d58c29/testing/marionette/sync.js#31
This is hard to discuss on a general basis. There probably are cases where we use TimedPromise where it hides actual problems. This is in party why we emit warnings, so that we are aware of when it happens. These warnings can be useful in narrowing down which scenarios TimedPromise is useful, and when it is a hammer, essentially used for a precision-tool job. The most likely use case for TimedPromise is when manipulating the window may, under some circumstances, not give us a definitive answer. For example we can’t tell on Linux without a WM whether the window becomes maximised, because there is no WM to emit the right X11 signals to Firefox. We also can’t detect that there’s _no WM_, which leaves us with a conundrum. The best thing we can do in this case is try and see if it works. Likely the window will get maximised to the fullest extent of the screen, but since we have no events to tell us this we are forced to abort the wait for an event after some arbitrary duration we have decided on. We can’t apply a timeout for every command, as things like WebDriver:ExecuteScript with a session script timeout of null is expected to run for infinity. But it would probably be a good thing to have for everything else so it’s possible we can guard a general timeout on a subset of commands that require this behaviour. You also mention a few other subclasses of commands that would be affected by this. I would like to hear your thoughts on what we could do for those window manipulation cases where we either aren’t getting, or can’t rely on, events. Waiting for a (longer) overall command timeout is not an option as these commands are called frequently in environments without WMs or enviroments we can’t trust to give us the right events.
You need to log in before you can comment on or make changes to this bug.