Closed Bug 1393366 Opened 2 years ago Closed 2 years ago

Intermittent MarionetteException: Please start a session

Categories

(Testing :: web-platform-tests, defect, P5)

Version 3
defect

Tracking

(firefox57 fixed, firefox58 fixed)

RESOLVED FIXED
mozilla58
Tracking Status
firefox57 --- fixed
firefox58 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jgraham)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Filed by: archaeopteryx [at] coole-files.de

https://treeherder.mozilla.org/logviewer.html#?job_id=125453601&repo=autoland

https://queue.taskcluster.net/v1/task/Zo5fzjWxS5Ov57a8fqAGcQ/runs/0/artifacts/public/logs/live_backing.log

https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/Zo5fzjWxS5Ov57a8fqAGcQ/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1

[task 2017-08-24T08:05:57.773272Z] 08:05:57     INFO - TEST-START | /webvtt/rendering/cues-with-video/processing-model/repaint.html
[task 2017-08-24T08:05:57.773751Z] 08:05:57     INFO - TEST-SKIP | /webvtt/rendering/cues-with-video/processing-model/repaint.html | took 0ms
[task 2017-08-24T08:05:57.774256Z] 08:05:57     INFO - TEST-START | /webvtt/rendering/cues-with-video/processing-model/selectors/cue_function/class_object/class_white-space_normal_wrapped.html
[task 2017-08-24T08:05:57.774759Z] 08:05:57     INFO - TEST-SKIP | /webvtt/rendering/cues-with-video/processing-model/selectors/cue_function/class_object/class_white-space_normal_wrapped.html | took 0ms
[task 2017-08-24T08:05:57.775285Z] 08:05:57     INFO - TEST-START | /webvtt/rendering/cues-with-video/processing-model/selectors/cue_function/class_object/class_white-space_pre-line_wrapped.html
[task 2017-08-24T08:05:57.775851Z] 08:05:57     INFO - TEST-SKIP | /webvtt/rendering/cues-with-video/processing-model/selectors/cue_function/class_object/class_white-space_pre-line_wrapped.html | took 0ms
[task 2017-08-24T08:05:57.776326Z] 08:05:57     INFO - TEST-START | /webvtt/rendering/cues-with-video/processing-model/size_50.html
[task 2017-08-24T08:05:57.776808Z] 08:05:57     INFO - TEST-SKIP | /webvtt/rendering/cues-with-video/processing-model/size_50.html | took 0ms
[task 2017-08-24T08:05:58.276937Z] 08:05:58     INFO - Setting up ssl
[task 2017-08-24T08:05:58.292789Z] 08:05:58     INFO - certutil | 
[task 2017-08-24T08:05:58.308858Z] 08:05:58     INFO - certutil | 
[task 2017-08-24T08:05:58.316676Z] 08:05:58     INFO - certutil | 
[task 2017-08-24T08:05:58.317651Z] 08:05:58     INFO - Certificate Nickname                                         Trust Attributes
[task 2017-08-24T08:05:58.318596Z] 08:05:58     INFO -                                                              SSL,S/MIME,JAR/XPI
[task 2017-08-24T08:05:58.318928Z] 08:05:58     INFO - 
[task 2017-08-24T08:05:58.319735Z] 08:05:58     INFO - web-platform-tests                                           CT,, 
[task 2017-08-24T08:05:58.320324Z] 08:05:58     INFO - 
[task 2017-08-24T08:05:58.321114Z] 08:05:58     INFO - Application command: /home/worker/workspace/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmphZHAjF.mozrunner
[task 2017-08-24T08:05:58.329293Z] 08:05:58     INFO - Starting runner
[task 2017-08-24T08:05:59.522096Z] 08:05:59     INFO - PID 883 | 1503561959504	addons.xpi	WARN	Error parsing extensions state: [Exception... "Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [amIAddonManagerStartup.readStartupData]"  nsresult: "0x80520012 (NS_ERROR_FILE_NOT_FOUND)"  location: "JS frame :: resource://gre/modules/addons/XPIProvider.jsm :: loadExtensionState :: line 1523"  data: no] Stack trace: loadExtensionState()@resource://gre/modules/addons/XPIProvider.jsm:1523 < getInstallState()@resource://gre/modules/addons/XPIProvider.jsm:1558 < checkForChanges()@resource://gre/modules/addons/XPIProvider.jsm:3093 < startup()@resource://gre/modules/addons/XPIProvider.jsm:2160 < callProvider()@resource://gre/modules/AddonManager.jsm:263 < _startProvider()@resource://gre/modules/AddonManager.jsm:733 < startup()@resource://gre/modules/AddonManager.jsm:900 < startup()@resource://gre/modules/AddonManager.jsm:3084 < observe()@jar:file:///home/worker/workspace/build/application/firefox/omni.ja!/components/addonManager.js:65
[task 2017-08-24T08:06:58.376990Z] 08:06:58  WARNING - Failed to connect to Marionette
[task 2017-08-24T08:06:58.378785Z] 08:06:58     INFO - STDERR: Traceback (most recent call last):
[task 2017-08-24T08:06:58.379145Z] 08:06:58     INFO -   File "/home/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py", line 130, in start_runner
[task 2017-08-24T08:06:58.379432Z] 08:06:58     INFO -     stop_flag.set()
[task 2017-08-24T08:06:58.379737Z] 08:06:58     INFO -   File "/home/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py", line 67, in __exit__
[task 2017-08-24T08:06:58.380023Z] 08:06:58     INFO -     self.teardown()
[task 2017-08-24T08:06:58.380306Z] 08:06:58     INFO -   File "/home/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py", line 75, in teardown
[task 2017-08-24T08:06:58.380570Z] 08:06:58     INFO -     self.executor.teardown()
[task 2017-08-24T08:06:58.380863Z] 08:06:58     INFO -   File "/home/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 511, in teardown
[task 2017-08-24T08:06:58.381149Z] 08:06:58     INFO -     self.implementation.teardown()
[task 2017-08-24T08:06:58.381493Z] 08:06:58     INFO -   File "/home/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 604, in teardown
[task 2017-08-24T08:06:58.381767Z] 08:06:58     INFO -     self.executor.protocol.marionette._send_message("reftest:teardown", {})
[task 2017-08-24T08:06:58.382056Z] 08:06:58     INFO -   File "/home/worker/workspace/build/venv/local/lib/python2.7/site-packages/marionette_driver/decorators.py", line 23, in _
[task 2017-08-24T08:06:58.382357Z] 08:06:58     INFO -     return func(*args, **kwargs)
[task 2017-08-24T08:06:58.382654Z] 08:06:58     INFO -   File "/home/worker/workspace/build/venv/local/lib/python2.7/site-packages/marionette_driver/marionette.py", line 720, in _send_message
[task 2017-08-24T08:06:58.382950Z] 08:06:58     INFO -     raise errors.MarionetteException("Please start a session")
[task 2017-08-24T08:06:58.383242Z] 08:06:58     INFO - MarionetteException: Please start a session
[task 2017-08-24T08:06:58.444193Z] 08:06:58     INFO - Browser exited with return code -15
[task 2017-08-24T08:06:58.447017Z] 08:06:58  WARNING - u'log' (u'critical', {u'message': 'Traceback (most recent call last):\n  File "/home/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py", line 130, in start_runner\n    stop_flag.set()\n  File "/home/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py", line 67, in __exit__\n    self.teardown()\n  File "/home/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py", line 75, in teardown\n    self.executor.teardown()\n  File "/home/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 511, in teardown\n    self.implementation.teardown()\n  File "/home/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 604, in teardown\n    self.executor.protocol.marionette._send_message("reftest:teardown", {})\n  File "/home/worker/workspace/build/venv/local/lib/python2.7/site-packages/marionette_driver/decorators.py", line 23, in _\n    return func(*args, **kwargs)\n  File "/home/worker/workspace/build/venv/local/lib/python2.7/site-packages/marionette_driver/marionette.py", line 720, in _send_message\n    raise errors.MarionetteException("Please start a session")\nMarionetteException: Please start a session\n'})
[task 2017-08-24T08:06:58.448324Z] 08:06:58     INFO - Got 0 unexpected results
This looks to be web-platform-tests related. It's only waiting 1min for the browser being up and running, including Marionette being initialized. As we have seen in the past this is too less time especially for a debug build as in this case.

James, can you have a look?
Component: Marionette → web-platform-tests
Flags: needinfo?(james)
I think you found out what was wrong with the timeout here? wpt is supposed to wait 60 * timeout_multiplier seconds, so it's only 60s for opt builds. That said, it seems very suspicious that this only happens on linux32; unless that hardware is much slower it seems plausible that there's some other underlying issue.
Flags: needinfo?(james)
wpt is using marionette-driver and not marionette-harness. So if you do not specify 60s explicitly, the default by the driver is 120s:

http://searchfox.org/mozilla-central/rev/44c693914255638d74bcf1ec3b0bcd448dfab2fd/testing/marionette/client/marionette_driver/marionette.py#556

But this is only used when you make use of `wait_for_port` or `raise_for_port` before the first call to `start_session`. Starting a session defaults to 60s and uses this value inappropriately to wait for the server to be ready. Instead it has to use the 120s: 

http://searchfox.org/mozilla-central/rev/44c693914255638d74bcf1ec3b0bcd448dfab2fd/testing/marionette/client/marionette_driver/marionette.py#1173-1202

I will take care of that in my patch on bug 1397675.
Depends on: 1397675
Still a problem in recent builds:

[task 2017-09-16T18:03:37.917Z] 18:03:37     INFO - Application command: /builds/worker/workspace/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpixumO2.mozrunner
[task 2017-09-16T18:03:37.925Z] 18:03:37     INFO - Starting runner
[task 2017-09-16T18:04:38.026Z] 18:04:38  WARNING - Failed to connect to Marionette

Which means there is still a startup_timeout of 60s set, which is too less for tests as run in our automation. Code related to this is:

> startup_timeout = marionette.Marionette.DEFAULT_STARTUP_TIMEOUT * self.timeout_multiplier

But why 60s with a timeout_multiplier of 1? It looks like that wpt-tests are not installing the in-tree version of Marionette driver but from PyPI.

James, is that correct and wanted? If yes, I would have to do a Marionette driver and client release.
Flags: needinfo?(james)
It does install marionette from in-tree; it's installed (yes, using pip) from marionette-requirements.txt which is testing/config/marionette-requirements.txt, which points at paths in the tree. I'm reasonably sure about this because otherwise stuff I added to marionette e.g. for reftests wouldn't have worked. I don't know why it looks like there's still a one-minute timeout, but it must be coming from somewhere.
Flags: needinfo?(james)
Ok, so I had another look at this and we seem to have different instances of this failure. One in `tearDown` as mentioned above, and another one in `setUp`:

https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=131503999&lineNumber=1275

[task 2017-09-16T18:04:38.026Z] 18:04:38  WARNING - Failed to connect to Marionette
[task 2017-09-16T18:04:38.030Z] 18:04:38     INFO - STDERR: Traceback (most recent call last):
[task 2017-09-16T18:04:38.031Z] 18:04:38     INFO -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py", line 128, in start_runner
[task 2017-09-16T18:04:38.032Z] 18:04:38     INFO -     runner.run()
[task 2017-09-16T18:04:38.033Z] 18:04:38     INFO -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py", line 84, in run
[task 2017-09-16T18:04:38.033Z] 18:04:38     INFO -     self.setup()
[task 2017-09-16T18:04:38.033Z] 18:04:38     INFO -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py", line 71, in setup
[task 2017-09-16T18:04:38.033Z] 18:04:38     INFO -     self.executor.setup(self)
[task 2017-09-16T18:04:38.034Z] 18:04:38     INFO -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 455, in setup
[task 2017-09-16T18:04:38.034Z] 18:04:38     INFO -     self.implementation.setup(**self.implementation_kwargs)
[task 2017-09-16T18:04:38.034Z] 18:04:38     INFO -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 532, in setup
[task 2017-09-16T18:04:38.034Z] 18:04:38     INFO -     self.executor.protocol.marionette.set_context(self.executor.protocol.marionette.CONTEXT_CHROME)
[task 2017-09-16T18:04:38.034Z] 18:04:38     INFO -   File "/builds/worker/workspace/build/venv/local/lib/python2.7/site-packages/marionette_driver/marionette.py", line 1498, in set_context
[task 2017-09-16T18:04:38.035Z] 18:04:38     INFO -     self._send_message("setContext", {"value": context})
[task 2017-09-16T18:04:38.035Z] 18:04:38     INFO -   File "/builds/worker/workspace/build/venv/local/lib/python2.7/site-packages/marionette_driver/decorators.py", line 23, in _
[task 2017-09-16T18:04:38.035Z] 18:04:38     INFO -     return func(*args, **kwargs)
[task 2017-09-16T18:04:38.035Z] 18:04:38     INFO -   File "/builds/worker/workspace/build/venv/local/lib/python2.7/site-packages/marionette_driver/marionette.py", line 740, in _send_message
[task 2017-09-16T18:04:38.035Z] 18:04:38     INFO -     raise errors.MarionetteException("Please start a session")
[task 2017-09-16T18:04:38.035Z] 18:04:38     INFO - MarionetteException: Please start a session

In both cases we have the timeout of 60s.

Interestingly no further failures are visible since Sep 16th. I think we should wait until Monday and see what OF updates we get.
Comment on attachment 8911111 [details]
Bug 1393366 - Increase the marionette connect timeout in wptrunner,

https://reviewboard.mozilla.org/r/182600/#review187882

::: commit-message-9192c:6
(Diff revision 1)
> +Bug 1393366 - Increase the marionette connect timeout in wptrunner, r=maja_zf
> +
> +This switches to using a timeout of 120s * timeout_multplier, which is
> +twice the current value. The Marionette startup_timout value is
> +removed because it turns out that's only used when marionette starts
> +the Firefox instance and therefore is ignored in this case.

No, it is not. That's what I already said on IRC yesterday. Also in the case Marionette doesn't start the instance we have to wait for Marionette server to be ready. The first time this can take longer, and as such warrants a longer timeout.

Please note that there is a bug in wait_for_port:
https://dxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette_driver/marionette.py#680-681

This should not default to `self.DEFAULT_STARTUP_TIMEOUT` but `self.startup_timeout`.

With that you wouldn't have to specify your timeout via `wait_for_port`, but as it was before via `__init__()`
Comment on attachment 8911111 [details]
Bug 1393366 - Increase the marionette connect timeout in wptrunner,

https://reviewboard.mozilla.org/r/182600/#review187882

> No, it is not. That's what I already said on IRC yesterday. Also in the case Marionette doesn't start the instance we have to wait for Marionette server to be ready. The first time this can take longer, and as such warrants a longer timeout.
> 
> Please note that there is a bug in wait_for_port:
> https://dxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette_driver/marionette.py#680-681
> 
> This should not default to `self.DEFAULT_STARTUP_TIMEOUT` but `self.startup_timeout`.
> 
> With that you wouldn't have to specify your timeout via `wait_for_port`, but as it was before via `__init__()`

As far as I can tell, this patch does what it says, works with the current code, and is likely to fix the problem. It is possible that some future iteration of the marionette client will change things so that this patch no longer represents the only possible way to set the relevant timeout. But you will *still* be able to pass a timeout to start_session, and since that's what we do here.

Am I missing something that means this patch actually doesn't work? If not I don't think we should hold it up over theoretical concerns.
Comment on attachment 8911111 [details]
Bug 1393366 - Increase the marionette connect timeout in wptrunner,

https://reviewboard.mozilla.org/r/182600/#review188026
Attachment #8911111 - Flags: review?(mjzffr) → review+
Pushed by mjzffr@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/7f3cc80412c6
Increase the marionette connect timeout in wptrunner, r=maja_zf
https://hg.mozilla.org/mozilla-central/rev/7f3cc80412c6
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Assignee: nobody → james
You need to log in before you can comment on or make changes to this bug.