Marionette should retry creating the server socket in case the port is in use

RESOLVED WONTFIX

Status

Testing
Marionette
RESOLVED WONTFIX
2 years ago
2 years ago

People

(Reporter: whimboo, Assigned: whimboo)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

MozReview Requests

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

(Assignee)

Description

2 years ago
Before we have a proper solution for bug 1240830, we should improve Marionette server so it tries constantly to start a socket server at the given port. As we have seen on bug 1293404 it can happen that another browser session is still open and it takes a while until the port is free again.

A change like that would help us to kill some of our intermittent failures.
(Assignee)

Updated

2 years ago
Blocks: 1293404
Comment hidden (mozreview-request)
(Assignee)

Updated

2 years ago
Attachment #8789373 - Flags: review?(ato)

Comment 2

2 years ago
mozreview-review
Comment on attachment 8789373 [details]
Bug 1301320 - Marionette should retry creating the server socket in case the port is in use.

https://reviewboard.mozilla.org/r/77630/#review75942

I wouldn’t like to introduce code that isn’t deterministic into Marionette.  I also think this isn’t the right place to address the fact that a socket from another browser session might still be open.

We could have the client find a random free port instead of using a hardcoded port.  The client would then release it and pass set the marionette.defaultPrefs.port preference.

The approach I describe is inherently racy, but has shown to work fairly well in practice.  At least until we have a proper fix for bug 1240830.
Attachment #8789373 - Flags: review?(ato) → review-
(Assignee)

Comment 3

2 years ago
mozreview-review-reply
Comment on attachment 8789373 [details]
Bug 1301320 - Marionette should retry creating the server socket in case the port is in use.

https://reviewboard.mozilla.org/r/77630/#review75942

No, this will not work if you are doing a restart of the application from inside itself. There is no chance for the marionette client to find another port. We have to continue using the same port, and as such have the hope that the old process frees it up within our default startup timeout of 120s.

Comment 4

2 years ago
mozreview-review-reply
Comment on attachment 8789373 [details]
Bug 1301320 - Marionette should retry creating the server socket in case the port is in use.

https://reviewboard.mozilla.org/r/77630/#review75942

17:10 <ato> whimboo: I see. How big of an issue is this? If it only happens infrequently I’d lean towards writing a proper patch for the underlying issue instead.
17:12 <ato> It’s odd that we can’t properly close the socket before the application exits.
17:14 aselagea|buildduty → aselagea|afk
17:15 <ato> So the restarts call GeckoDriver#quitApplication, right? I wonder why MarionetteServer#stop that gets called in stopSignal doesn’t close the socket.
17:16 <ato> Oh I see: https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsIServerSocket#close()
17:16 <ato> We need to cycle through the client sockets already connected and close each one.
17:21 <ato> Or we need to explicitly close the socket in the client when we request a restart.
(Assignee)

Comment 5

2 years ago
mozreview-review-reply
Comment on attachment 8789373 [details]
Bug 1301320 - Marionette should retry creating the server socket in case the port is in use.

https://reviewboard.mozilla.org/r/77630/#review75942

The current situation is that due to a change in the updater component all of our update tests on Linux are failing for mozilla-central, and mozilla-aurora. It's not clear if Matt will be able to get this fixed via bug 1293404 before we reach beta. But thankfully we get another week before the merge will actually happen.

The problem here is not related to how we close the socket, but in how Firefox performs the update. With the change in bug 1272614 we now have a second instance running via forking the process. As result the updated version of Firefox cannot use the port which is still blocked by the updater, which itself exits after some seconds and releases the port.

Regarding the closing situation, this is what I added via bug 1293982 to the client. For both restart() and quit() methods we force a shutdown of the socket connection.
(Assignee)

Updated

2 years ago
Flags: needinfo?(ato)
(Assignee)

Comment 6

2 years ago
Andreas, 6 days passed by without a reply. As mentioned on IRC it's a somewhat critical situation for us. Can you please make sure to reply soon, so we can find a solution which will work for us both? Maybe David can also chime in here.

If we cannot get a solution by tomorrow I will have to disable update tests on Linux for the upcoming Firefox beta, or cut out an important piece of the tests.

Thanks.

Comment 7

2 years ago
mozreview-review-reply
Comment on attachment 8789373 [details]
Bug 1301320 - Marionette should retry creating the server socket in case the port is in use.

https://reviewboard.mozilla.org/r/77630/#review75942

> No, this will not work if you are doing a restart of the application from inside itself. There is no chance for the marionette client to find another port. We have to continue using the same port, and as such have the hope that the old process frees it up within our default startup timeout of 120s.

I still don’t understand why the listening socket isn’t released when the process ends.  Do you have a good explanation for this?
Flags: needinfo?(ato)
(Assignee)

Comment 8

2 years ago
mozreview-review-reply
Comment on attachment 8789373 [details]
Bug 1301320 - Marionette should retry creating the server socket in case the port is in use.

https://reviewboard.mozilla.org/r/77630/#review75942

So currently we have a bug in the updater which demonstrates it very well. After forking the process it doesn't wait long enough (until the updater has finished) before starting the new version of Firefox. At this time two processes are running, whereby the newer one is not able to claim the socket port. Once the former process closes, Marionette server could pick-up the port because it got released.
(Assignee)

Comment 9

2 years ago
mozreview-review-reply
Comment on attachment 8789373 [details]
Bug 1301320 - Marionette should retry creating the server socket in case the port is in use.

https://reviewboard.mozilla.org/r/77630/#review75942

Btw. today we will check if a proposed patch for the updater will fix this very specific issue. So far it's not clear if it will fix it, so I still would like to have a fallback available which we can push to aurora.

Comment 10

2 years ago
mozreview-review-reply
Comment on attachment 8789373 [details]
Bug 1301320 - Marionette should retry creating the server socket in case the port is in use.

https://reviewboard.mozilla.org/r/77630/#review75942

It doesn’t feel to me like the solution is to introduce non-deterministic code to Marionette (especially code that busy-waits) when the problem lies in the updater.  If it’s not waiting for the update to finish before spawning a new process, that’s a serious matter.

The other option is to implement something in Marionette that passes the port it binds to back to the client through some IPC mechanism, and have it bind to port 0.  Then the client initiates a Firefox restart we can teach it to pick up the new port for the next TCP connection from there.
(Assignee)

Comment 11

2 years ago
Please see the latest comments on bug 1240830. I only propose this as a workaround until we have a proper solution ready in the other bug. Also doing this has a high risk of regressions, so it is not something I want to land on aurora or even beta.

Given that we don't get a consensus here I will mark this bug as wontfix, and hope that we can get the updater fixed until we reach beta by next week.
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.