Slow network connection cause Marionette handshake to fail over and over again
Categories
(Testing :: geckodriver, defect, P3)
Tracking
(firefox-esr78 wontfix, firefox-esr91 wontfix, firefox91 wontfix, firefox92 wontfix, firefox93 fixed)
People
(Reporter: whimboo, Assigned: whimboo)
References
(Regression, )
Details
(Keywords: regression)
Attachments
(2 files)
As noticed on https://github.com/mozilla/geckodriver/issues/1903 there is a recursion in the Marionette handshake when slow network setups are present. Especially for BrowserTime scenarios there is a need for it to test eg. on 3G and such.
I had a look and as it looks like it's a regression from bug 1525126 where I reduced the socket timeout to just 100ms for the handshake only. That seems to be a very short timeout and clearly needs to be increased or even be removed so that the normal timeout of the socket connection, which should be 60s, gets used.
Downside of the removal would be that it would take a certain amount of time longer to detect if there is still a browser process running in case of startup crashes.
It would be good to see this bug fixed for the geckodriver 0.30.0 release.
Assignee | ||
Comment 1•3 years ago
|
||
To add more details:
- Setting shorter timeout: https://searchfox.org/mozilla-central/source/testing/geckodriver/src/marionette.rs#1126).
- Loop to detect browser startup crashes: https://searchfox.org/mozilla-central/source/testing/geckodriver/src/marionette.rs#1087-1111
Assignee | ||
Comment 2•3 years ago
|
||
Also as discussed in the triage meeting moving to needinfo from James.
Assignee | ||
Comment 3•3 years ago
|
||
This is actually not a recursion but repeatedly trying to pass the handshake with Marionette. But given that we always make use of the same 100ms timeout there is no difference until it ends-up in the 60s timeout. As such we should better increase the poll interval / timeout for each try until the maximum timeout has been reached.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 4•3 years ago
|
||
Comment 5•3 years ago
|
||
If we don't read the WebDriver handshake within the socket timeout,
don't reconnect, but reuse the same connection for subsequent attempts
at the read. This means that once the data is sent, OS-level buffering
should ensure we can read it on subsequent attempts, even if the
network is too slow to send the data inside the initial timeout.
Implementation wise, this turns the connection into an explicit state
machine. Implementation wise care is taken to ensure that only a
single state is ever in scope; in particular we can't have multiple
references to the stream.
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 6•3 years ago
|
||
As it turned out the issues that were remaining have been caused by a bug in throttle, which got fixed in the 3.0 release recently. It caused also a slowdown of the network for localhost
on MacOS, including any connection via adb
. With the mentioned fix there is no longer a problem for folks at Browsertime, so we are fine to get the current patch landed.
Comment 8•3 years ago
|
||
bugherder |
Comment 9•3 years ago
|
||
Is this something we'd want to backport? Not sure how geckodriver releases work :)
Assignee | ||
Comment 10•3 years ago
|
||
No, we always release geckodriver from mozilla-central. So we can mark any other branch as wontfix.
Description
•