Closed Bug 1739008 Opened 9 months ago Closed 9 months ago

Intermittent Mn/en-US | marionette_driver.errors.SessionNotCreatedException: TimeoutError: TimedPromise timed out after 300000 ms

Categories

(Testing :: Marionette, defect, P5)

defect

Tracking

(firefox-esr91 fixed, firefox94 unaffected, firefox95 fixed, firefox96 fixed)

RESOLVED FIXED
96 Branch
Tracking Status
firefox-esr91 --- fixed
firefox94 --- unaffected
firefox95 --- fixed
firefox96 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jdescottes)

References

(Regression)

Details

(Keywords: intermittent-failure, regression, Whiteboard: [retriggered][stockwell unknown])

Attachments

(2 files)

Filed by: smolnar [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=356776264&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/KfYnZqAYSaitQYvxolShIQ/runs/0/artifacts/public/logs/live_backing.log


INFO - TEST-START | testing/marionette/harness/marionette_harness/tests/unit/test_execute_async_script.py TestExecuteAsyncContent.test_execute_js_exception
[task 2021-11-02T19:38:34.919Z] 19:38:34     INFO -  1635881914919	Marionette	DEBUG	Accepted connection 350 from 127.0.0.1:44086
[task 2021-11-02T19:38:34.920Z] 19:38:34     INFO -  1635881914920	Marionette	DEBUG	Closed connection 349
[task 2021-11-02T19:38:34.924Z] 19:38:34     INFO -  1635881914924	Marionette	DEBUG	350 -> [0,1,"WebDriver:NewSession",{"strictFileInteractability":true}]
[task 2021-11-02T19:38:44.084Z] 19:38:44     INFO -  DEBUG: Adding blocker SessionFile: Finish writing Session Restore data for phase profile-before-change
[task 2021-11-02T19:38:44.111Z] 19:38:44     INFO -  DEBUG: Completed blocker SessionFile: Finish writing Session Restore data for phase profile-before-change
[task 2021-11-02T19:38:59.117Z] 19:38:59     INFO -  DEBUG: Adding blocker SessionFile: Finish writing Session Restore data for phase profile-before-change
[task 2021-11-02T19:38:59.142Z] 19:38:59     INFO -  DEBUG: Completed blocker SessionFile: Finish writing Session Restore data for phase profile-before-change
[task 2021-11-02T19:43:34.933Z] 19:43:34     INFO -  1635882214932	Marionette	DEBUG	350 <- [1,1,{"error":"session not created","message":"TimeoutError: TimedPromise timed out after 300000 ms","stacktrace":"WebDriverE ... r@chrome://remote/content/shared/webdriver/Errors.jsm:450:5\nbail@chrome://remote/content/marionette/sync.js:230:19\n"},null]
[task 2021-11-02T19:43:34.938Z] 19:43:34     INFO - TEST-UNEXPECTED-ERROR | testing/marionette/harness/marionette_harness/tests/unit/test_execute_async_script.py TestExecuteAsyncContent.test_execute_js_exception | marionette_driver.errors.SessionNotCreatedException: TimeoutError: TimedPromise timed out after 300000 ms
[task 2021-11-02T19:43:34.939Z] 19:43:34     INFO - stacktrace:
[task 2021-11-02T19:43:34.939Z] 19:43:34     INFO - 	WebDriverError@chrome://remote/content/shared/webdriver/Errors.jsm:181:5
[task 2021-11-02T19:43:34.939Z] 19:43:34     INFO - 	TimeoutError@chrome://remote/content/shared/webdriver/Errors.jsm:450:5
[task 2021-11-02T19:43:34.939Z] 19:43:34     INFO - 	bail@chrome://remote/content/marionette/sync.js:230:19
[task 2021-11-02T19:43:34.939Z] 19:43:34     INFO - Traceback (most recent call last):
[task 2021-11-02T19:43:34.939Z] 19:43:34     INFO -   File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_harness/marionette_test/testcases.py", line 183, in run
[task 2021-11-02T19:43:34.939Z] 19:43:34     INFO -     self.setUp()
[task 2021-11-02T19:43:34.939Z] 19:43:34     INFO -   File "/builds/worker/workspace/build/tests/marionette/tests/testing/marionette/harness/marionette_harness/tests/unit/test_execute_async_script.py", line 18, in setUp
[task 2021-11-02T19:43:34.939Z] 19:43:34     INFO -     super(TestExecuteAsyncContent, self).setUp()
[task 2021-11-02T19:43:34.939Z] 19:43:34     INFO -   File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_harness/marionette_test/testcases.py", line 405, in setUp
[task 2021-11-02T19:43:34.940Z] 19:43:34     INFO -     super(MarionetteTestCase, self).setUp()
[task 2021-11-02T19:43:34.940Z] 19:43:34     INFO -   File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_harness/marionette_test/testcases.py", line 308, in setUp
[task 2021-11-02T19:43:34.940Z] 19:43:34     INFO -     self.marionette.start_session()
[task 2021-11-02T19:43:34.940Z] 19:43:34     INFO -   File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/decorators.py", line 27, in _
[task 2021-11-02T19:43:34.940Z] 19:43:34     INFO -     return func(*args, **kwargs)
[task 2021-11-02T19:43:34.940Z] 19:43:34     INFO -   File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/marionette.py", line 1187, in start_session
[task 2021-11-02T19:43:34.940Z] 19:43:34     INFO -     resp = self._send_message("WebDriver:NewSession", capabilities)
[task 2021-11-02T19:43:34.940Z] 19:43:34     INFO -   File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/decorators.py", line 27, in _
[task 2021-11-02T19:43:34.940Z] 19:43:34     INFO -     return func(*args, **kwargs)
[task 2021-11-02T19:43:34.940Z] 19:43:34     INFO -   File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/marionette.py", line 630, in _send_message
[task 2021-11-02T19:43:34.941Z] 19:43:34     INFO -     self._handle_error(err)
[task 2021-11-02T19:43:34.941Z] 19:43:34     INFO -   File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/marionette.py", line 652, in _handle_error
[task 2021-11-02T19:43:34.941Z] 19:43:34     INFO -     raise errors.lookup(error)(message, stacktrace=stacktrace)
[task 2021-11-02T19:43:34.941Z] 19:43:34     INFO - TEST-INFO took 300018ms
[task 2021-11-02T19:43:34.941Z] 19:43:34     INFO - TEST-START | testing/marionette/harness/marionette_harness/tests/unit/test_execute_async_script.py TestExecuteAsyncContent.test_execute_no_return

@whimbo, could you take a look at these failures
started to fail after bug 1736323 landed.

Flags: needinfo?(hskupin)

Will take a look today

Flags: needinfo?(hskupin) → needinfo?(jdescottes)

Please also check if bug 1739066 and bug 1739074 It's uncommon to see new Marionette failures which have been observed several times for a day.

Yes, those seem to be also related.

Blocks: 1739066, 1739074

Set release status flags based on info from the regressing bug 1736323

Started adding some logs, as expected we timeout before waitForNavigationCompleted in newSession. Meaning we get a webprogresslistener with isLoading set to true, but we never actually get the navigation events.

It's not that surprising because the pattern we use is racy by nature.

     if (browsingContext.webProgress.isLoadingDocument) {
        await navigate.waitForNavigationCompleted(this, () => {}, {
          loadEventExpected: true,
          unknownState: true,
        });
      }

In the parent process we detect that we have a load, but then we will communicate to the content process in order to add load listeners. So before the listeners can be added, the navigation might complete and we might never receive any load event.

Either we need to fully monitor the load from the parent process, or we need to be a bit more careful in navigate.js in case we pass unknownState to true. I think we should check the state of the document (or of the webprogresslistener?) in the content process before expecting a load event.

Flags: needinfo?(jdescottes)
See Also: → 1739231
Whiteboard: [retriggered][stockwell needswork:owner]
Summary: Intermittent testing/marionette/harness/marionette_harness/tests/<random> TestExecuteAsyncContent.test_execute_js_exception/TestCloseWindow.test_close_chrome_window_for_browser_window | marionette_driver.errors.SessionNotCreatedException: TimeoutError: Ti → Intermittent Mn | marionette_driver.errors.SessionNotCreatedException: TimeoutError: TimedPromise timed out after 300000 ms
See Also: 1739231
Duplicate of this bug: 1739231

I discussed the topic with Julian yesterday and he will try out some options. If we cannot find the right solution until afternoon today I will go ahead and request a backout of bug 1736323 given that this new hang in WebDriver:NewSession is way more severe.

Assignee: nobody → jdescottes
Status: NEW → ASSIGNED
Pushed by jdescottes@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/78d6ea95a82f
[marionette] Monitor navigation using the webprogresslistener for NewSession r=webdriver-reviewers,whimboo

Backed out for causing android wpt failures.

Flags: needinfo?(jdescottes)
Summary: Intermittent Mn | marionette_driver.errors.SessionNotCreatedException: TimeoutError: TimedPromise timed out after 300000 ms → Intermittent Mn/en-US | marionette_driver.errors.SessionNotCreatedException: TimeoutError: TimedPromise timed out after 300000 ms
Pushed by jdescottes@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ff1b3692fe81
[marionette] Monitor navigation using the webprogresslistener for NewSession r=webdriver-reviewers,whimboo

Renewing my need-info: will have to uplift this and Bug 1736323 to beta. Uplifting patches from both bugs will allow to reuse the same patches as the ones used against central. Let's wait to see if the current patch sticks.

Flags: needinfo?(jdescottes)
Flags: needinfo?(jdescottes)
Status: ASSIGNED → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → 96 Branch
No longer regressions: 1740021
See Also: → 1740021

Comment on attachment 9249223 [details]
Bug 1739008 - [marionette] Monitor navigation using the webprogresslistener for NewSession

Beta/Release Uplift Approval Request

  • User impact if declined: Users of marionette/geckodriver (eg selenium webdriver users) are exposed to a race condition that can lead to sessions created too early, and therefore making their scripts fail. Hitting this race condition became much more frequent after Bug 1735500 landed in Firefox 95.

We fixed it over Bug 1736323 and Bug 1739008 (this bug), which only landed in 96. We should uplift both patches to 95 to avoid issues due to Bug 1735500.

  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: Bug 1736323
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Low risk as it only affects marionette users and is changing the logic to create new sessions in order to wait for a load in case we are creating a session when a document is not loaded.
    The new session logic is extensively tested and relied upon in many tests and test suites.
  • String changes made/needed:
Flags: needinfo?(jdescottes)
Attachment #9249223 - Flags: approval-mozilla-beta?

Comment on attachment 9249223 [details]
Bug 1739008 - [marionette] Monitor navigation using the webprogresslistener for NewSession

Approved for 95 beta 5, thanks.

Attachment #9249223 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Can we get a rebased patch for ESR now that the regressing patch was uplifted there?

Flags: needinfo?(hskupin)

I'm not sure if we need an uplift here. It regressed because of a feature (bug 1735500) that landed for Firefox 95. My try build hasn't shown any failing tests for MacOS on esr91.

Flags: needinfo?(hskupin) → needinfo?(ryanvm)

Interesting. Julian, would you be able to provide a backport patch for esr91?

Flags: needinfo?(hskupin) → needinfo?(jdescottes)

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: This fixes a regression from Bug 1736323 which was just uplifted to ESR91
User impact if declined: user impact should be relatively low, this issue mostly leads to intermittent failures and pollutes our CI
Fix Landed on Version: 95 (landed in 96 but was uplifted)
Risk to taking this patch (and alternatives if risky): Low, same comment as for the beta uplift: Only affects marionette users and is changing the logic to create new sessions in order to wait for a load in case we are creating a session when a document is not loaded. The new session logic is extensively tested and relied upon in many tests and test suites.

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

Attachment #9266494 - Flags: approval-mozilla-esr91?

(In reply to Julian Descottes [:jdescottes] from comment #31)

Let's see how this goes on try: https://treeherder.mozilla.org/jobs?repo=try&revision=1250341fb5106d114e2f8c99507b39c9f7475ec9

Note that you triggered the wdspec jobs for MacOS only. But the two failures that we got are from Linux. So you may want to extend to maybe all desktop platforms.

Flags: needinfo?(jdescottes)

triggered some additional jobs, but feel free to add more

Flags: needinfo?(jdescottes)

All the triggered tests are passing. So I'm fine with the following commit to get uplifted to esr91:
https://hg.mozilla.org/try/rev/e15b4708e41b39258862912dcbc8cd94d72ba028

Ryan can you help us with the uplift?

Flags: needinfo?(ryanvm)

Comment on attachment 9266494 [details]
bug_1739008_esr91_backport.patch

Approved for esr91, thanks.

Flags: needinfo?(ryanvm)
Attachment #9266494 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+
You need to log in before you can comment on or make changes to this bug.