Closed Bug 1723451 Opened 3 years ago Closed 3 years ago

Slow network connection cause Marionette handshake to fail over and over again

Categories

(Testing :: geckodriver, defect, P3)

Default
defect

Tracking

(firefox-esr78 wontfix, firefox-esr91 wontfix, firefox91 wontfix, firefox92 wontfix, firefox93 fixed)

RESOLVED FIXED
93 Branch
Tracking Status
firefox-esr78 --- wontfix
firefox-esr91 --- wontfix
firefox91 --- wontfix
firefox92 --- wontfix
firefox93 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

(Regression, )

Details

(Keywords: regression)

Attachments

(2 files)

As noticed on https://github.com/mozilla/geckodriver/issues/1903 there is a recursion in the Marionette handshake when slow network setups are present. Especially for BrowserTime scenarios there is a need for it to test eg. on 3G and such.

I had a look and as it looks like it's a regression from bug 1525126 where I reduced the socket timeout to just 100ms for the handshake only. That seems to be a very short timeout and clearly needs to be increased or even be removed so that the normal timeout of the socket connection, which should be 60s, gets used.

Downside of the removal would be that it would take a certain amount of time longer to detect if there is still a browser process running in case of startup crashes.

It would be good to see this bug fixed for the geckodriver 0.30.0 release.

Also as discussed in the triage meeting moving to needinfo from James.

Flags: needinfo?(james)
Whiteboard: [webdriver:triage]

This is actually not a recursion but repeatedly trying to pass the handshake with Marionette. But given that we always make use of the same 100ms timeout there is no difference until it ends-up in the 60s timeout. As such we should better increase the poll interval / timeout for each try until the maximum timeout has been reached.

Flags: needinfo?(james)
Summary: Slow network connection cause recursion in Marionette handshake → Slow network connection cause Marionette handshake to fail over and over again
Assignee: nobody → hskupin
Severity: -- → S3
Status: NEW → ASSIGNED
Priority: -- → P3

If we don't read the WebDriver handshake within the socket timeout,
don't reconnect, but reuse the same connection for subsequent attempts
at the read. This means that once the data is sent, OS-level buffering
should ensure we can read it on subsequent attempts, even if the
network is too slow to send the data inside the initial timeout.

Implementation wise, this turns the connection into an explicit state
machine. Implementation wise care is taken to ensure that only a
single state is ever in scope; in particular we can't have multiple
references to the stream.

Attachment #9234322 - Attachment is obsolete: true
Assignee: hskupin → james
Attachment #9234322 - Attachment is obsolete: false
Attachment #9234322 - Attachment description: Bug 1723451 - [geckodriver] Incrementely increase Marionette handshake timeout for slow connections. → WIP: Bug 1723451 - [geckodriver] Incrementely increase Marionette handshake timeout for slow connections.
Assignee: james → hskupin
Attachment #9234322 - Attachment description: WIP: Bug 1723451 - [geckodriver] Incrementely increase Marionette handshake timeout for slow connections. → Bug 1723451 - [geckodriver] Incrementely increase Marionette handshake timeout for slow connections.
Attachment #9234322 - Attachment description: Bug 1723451 - [geckodriver] Incrementely increase Marionette handshake timeout for slow connections. → Bug 1723451 - [geckodriver] Use larger Marionette handshake timeout to not fail for slow connections.
Attachment #9234322 - Attachment description: Bug 1723451 - [geckodriver] Use larger Marionette handshake timeout to not fail for slow connections. → Bug 1723451 - [geckodriver] Incrementely increase Marionette handshake timeout for slow connections.
Attachment #9234322 - Attachment description: Bug 1723451 - [geckodriver] Incrementely increase Marionette handshake timeout for slow connections. → Bug 1723451 - [geckodriver] Use larger Marionette handshake timeout to not fail for slow connections.

As it turned out the issues that were remaining have been caused by a bug in throttle, which got fixed in the 3.0 release recently. It caused also a slowdown of the network for localhost on MacOS, including any connection via adb. With the mentioned fix there is no longer a problem for folks at Browsertime, so we are fine to get the current patch landed.

Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d82efb06794c [geckodriver] Use larger Marionette handshake timeout to not fail for slow connections. r=webdriver-reviewers,jgraham,jdescottes
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 93 Branch

Is this something we'd want to backport? Not sure how geckodriver releases work :)

Flags: needinfo?(hskupin)

No, we always release geckodriver from mozilla-central. So we can mark any other branch as wontfix.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: