Closed Bug 1723451 Opened 3 years ago Closed 3 years ago

Slow network connection cause Marionette handshake to fail over and over again

Tracking

(firefox-esr78 wontfix, firefox-esr91 wontfix, firefox91 wontfix, firefox92 wontfix, firefox93 fixed)

Status:

RESOLVED FIXED

Milestone:

93 Branch

Tracking Flags:

Tracking

Status

firefox-esr78

---

wontfix

firefox-esr91

---

wontfix

firefox91

---

wontfix

firefox92

---

wontfix

firefox93

---

fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

(Regression,
URL
)

Details

(Keywords: regression)

Attachments

(2 files)

Bug 1723451 - [geckodriver] Use larger Marionette handshake timeout to not fail for slow connections. 3 years ago Henrik Skupin [:whimboo][⌚️UTC+2] (away 10/03 - 10/13) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1723451 - Don't reconnect if we fail to read marionette handshake, 3 years ago James Graham [:jgraham] 48 bytes, text/x-phabricator-request		Details \| Review

Henrik Skupin [:whimboo][⌚️UTC+2] (away 10/03 - 10/13)

Assignee

Description

•

3 years ago

As noticed on https://github.com/mozilla/geckodriver/issues/1903 there is a recursion in the Marionette handshake when slow network setups are present. Especially for BrowserTime scenarios there is a need for it to test eg. on 3G and such.

I had a look and as it looks like it's a regression from bug 1525126 where I reduced the socket timeout to just 100ms for the handshake only. That seems to be a very short timeout and clearly needs to be increased or even be removed so that the normal timeout of the socket connection, which should be 60s, gets used.

Downside of the removal would be that it would take a certain amount of time longer to detect if there is still a browser process running in case of startup crashes.

It would be good to see this bug fixed for the geckodriver 0.30.0 release.

Henrik Skupin [:whimboo][⌚️UTC+2] (away 10/03 - 10/13)

Assignee

Comment 1

•

3 years ago

To add more details:

Setting shorter timeout: https://searchfox.org/mozilla-central/source/testing/geckodriver/src/marionette.rs#1126).
Loop to detect browser startup crashes: https://searchfox.org/mozilla-central/source/testing/geckodriver/src/marionette.rs#1087-1111

Henrik Skupin [:whimboo][⌚️UTC+2] (away 10/03 - 10/13)

Assignee

Comment 2

•

3 years ago

Also as discussed in the triage meeting moving to needinfo from James.

Flags: needinfo?(james)

Whiteboard: [webdriver:triage]

Henrik Skupin [:whimboo][⌚️UTC+2] (away 10/03 - 10/13)

Assignee

Comment 3

•

3 years ago

This is actually not a recursion but repeatedly trying to pass the handshake with Marionette. But given that we always make use of the same 100ms timeout there is no difference until it ends-up in the 60s timeout. As such we should better increase the poll interval / timeout for each try until the maximum timeout has been reached.

Flags: needinfo?(james)

Henrik Skupin [:whimboo][⌚️UTC+2] (away 10/03 - 10/13)

Assignee

Updated

•

3 years ago

Summary: Slow network connection cause recursion in Marionette handshake → Slow network connection cause Marionette handshake to fail over and over again

Henrik Skupin [:whimboo][⌚️UTC+2] (away 10/03 - 10/13)

Assignee

Updated

•

3 years ago

Assignee: nobody → hskupin

Severity: -- → S3

Status: NEW → ASSIGNED

Priority: -- → P3

Henrik Skupin [:whimboo][⌚️UTC+2] (away 10/03 - 10/13)

Assignee

Comment 4

•

3 years ago

Attached file Bug 1723451 - [geckodriver] Use larger Marionette handshake timeout to not fail for slow connections. — Details

James Graham [:jgraham]

Comment 5

•

3 years ago

Attached file Bug 1723451 - Don't reconnect if we fail to read marionette handshake, — Details

If we don't read the WebDriver handshake within the socket timeout,
don't reconnect, but reuse the same connection for subsequent attempts
at the read. This means that once the data is sent, OS-level buffering
should ensure we can read it on subsequent attempts, even if the
network is too slow to send the data inside the initial timeout.

Implementation wise, this turns the connection into an explicit state
machine. Implementation wise care is taken to ensure that only a
single state is ever in scope; in particular we can't have multiple
references to the stream.

Phabricator Automation

Updated

•

3 years ago

Attachment #9234322 - Attachment is obsolete: true

Henrik Skupin [:whimboo][⌚️UTC+2] (away 10/03 - 10/13)

Assignee

Updated

•

3 years ago

Assignee: hskupin → james

Phabricator Automation

Updated

•

3 years ago

Attachment #9234322 - Attachment is obsolete: false

Phabricator Automation

Updated

•

3 years ago

Attachment #9234322 - Attachment description: Bug 1723451 - [geckodriver] Incrementely increase Marionette handshake timeout for slow connections. → WIP: Bug 1723451 - [geckodriver] Incrementely increase Marionette handshake timeout for slow connections.

Henrik Skupin [:whimboo][⌚️UTC+2] (away 10/03 - 10/13)

Assignee

Updated

•

3 years ago

Assignee: james → hskupin

Phabricator Automation

Updated

•

3 years ago

Attachment #9234322 - Attachment description: WIP: Bug 1723451 - [geckodriver] Incrementely increase Marionette handshake timeout for slow connections. → Bug 1723451 - [geckodriver] Incrementely increase Marionette handshake timeout for slow connections.

Phabricator Automation

Updated

•

3 years ago

Attachment #9234322 - Attachment description: Bug 1723451 - [geckodriver] Incrementely increase Marionette handshake timeout for slow connections. → Bug 1723451 - [geckodriver] Use larger Marionette handshake timeout to not fail for slow connections.

Phabricator Automation

Updated

•

3 years ago

Attachment #9234322 - Attachment description: Bug 1723451 - [geckodriver] Use larger Marionette handshake timeout to not fail for slow connections. → Bug 1723451 - [geckodriver] Incrementely increase Marionette handshake timeout for slow connections.

Phabricator Automation

Updated

•

3 years ago

Henrik Skupin [:whimboo][⌚️UTC+2] (away 10/03 - 10/13)

Assignee

Comment 6

•

3 years ago

As it turned out the issues that were remaining have been caused by a bug in throttle, which got fixed in the 3.0 release recently. It caused also a slowdown of the network for localhost on MacOS, including any connection via adb. With the mentioned fix there is no longer a problem for folks at Browsertime, so we are fine to get the current patch landed.

Pulsebot

Comment 7

•

3 years ago

Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d82efb06794c [geckodriver] Use larger Marionette handshake timeout to not fail for slow connections. r=webdriver-reviewers,jgraham,jdescottes

Andreea Pavel [:apavel]

Comment 8

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d82efb06794c

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

status-firefox93: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 93 Branch

Ryan VanderMeulen [:RyanVM]

Comment 9

•

3 years ago

Is this something we'd want to backport? Not sure how geckodriver releases work :)

status-firefox91: --- → affected

status-firefox92: --- → affected

status-firefox-esr78: --- → wontfix

status-firefox-esr91: --- → affected

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+2] (away 10/03 - 10/13)

Assignee

Comment 10

•

3 years ago

No, we always release geckodriver from mozilla-central. So we can mark any other branch as wontfix.

status-firefox91: affected → wontfix

status-firefox92: affected → wontfix

status-firefox-esr91: affected → wontfix

Flags: needinfo?(hskupin)

You need to log in before you can comment on or make changes to this bug.