Closed Bug 1290372 Opened 3 years ago Closed 3 years ago

wait_for_port() doesn't take process status of application into account (eg. return early if process does not exist)

Categories

(Testing :: Marionette, defect)

Version 3
defect
Not set

Tracking

(firefox52 fixed)

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

Details

Attachments

(1 file)

Right now when we call wait_for_port() we wait until the socket timeout, and trying constantly to create a socket connection. We could drastically reduce the duration of that check if the status of the process would be taken into account. It means if the returncode is not None, the process has been shutdown or died. In such a case it doesn't make sense to wait longer for the socket server to be started.

I will first check how to improve the socket timeout situation over on bug 1284457 before getting to this bug.
Summary: wait_for_port() doesn't take process status of application into account (return early of process does not exist) → wait_for_port() doesn't take process status of application into account (eg. return early if process does not exist)
The code I'm referring to is here:

https://dxr.mozilla.org/mozilla-central/rev/506facea63169a29e04eb140663da1730052db64/testing/marionette/client/marionette_driver/marionette.py#645

As far as I can see the code always gets called for an already running or freshly started instance. So in case of a startup crash we definitely would run into a timeout, and waste about 120s of our time for nothing.

Maybe we should get the loop out of transport.py and implement it in wait_for_port(). We can then rename the method in transport.py to something like check_for_port() which returns immediately. With that change wait_for_port() could also check for the process status (returncode) because the Marionette class has the runner instance.

Andreas, do you agree with that? If yes, I would like to make it a mentored bug. Thanks.
Flags: needinfo?(ato)
The return code is only something I believe we can check for Firefox, as it isn’t really a thing for Fennec (or was for B2G).  transport.wait_for_port is as far as I can tell only used in marionette.py, i.e. there’s no internal use in transport.py of it, so I do have some sympathy towards making what is essentially a “has Gecko started up yet and spun up the Marionette server, and can we speak to it” smarter.

So I have no problems with detaching transport.wait_for_port from transport, or making it look at more things than just socket connection.  But I need clarification if this will work for mobile on device and emulator (maybe it’s fine to skip those checks there).  Are there more things we should look at?
Flags: needinfo?(ato)
I do not see a difference between Firefox and Fennec. Both are applications we launch and should have the returncode for when the process ends. So at any time we know if the process is running or not. Even in case of B2G the returncode might have been always None (running), so it shouldn't harm us.

I don't see anything else we should check for at this point. I just want to make sure that we do not run into a timeout only because the process is not alive. The time we spent here can be better used for other jobs.

Thanks for your quick reply!
Mentor: hskupin
Whiteboard: [lang=py]
(In reply to Henrik Skupin (:whimboo) from comment #3)
> I do not see a difference between Firefox and Fennec. Both are applications
> we launch and should have the returncode for when the process ends. So at
> any time we know if the process is running or not.

The “process” for Fennec is of the Android emulator, right?  I didn’t think the exit code of the emulator corresponded to the exit code of the Fennec Android app.
What we care about here is the process status or exit code of Fennec only. That is the application Marionette works with.
You should be able to treat Fennec and Desktop instances the same way in Marionette. 

When dealing with Fennec in Marionette, self.instance.runner.returncode and .is_running refer to the remote process (Fennec on the emulator/device) and self.instance.runner.device refers to the Emulator process.
Depends on: 1301661
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Attachment #8793728 - Flags: review?(ato)
Comment on attachment 8793728 [details]
Bug 1290372 - wait_for_port() has to return early if instance is not running.

https://reviewboard.mozilla.org/r/80410/#review80232

::: testing/marionette/client/marionette_driver/marionette.py:646
(Diff revision 1)
>          except socket.error:
>              return False
>          finally:
>              s.close()
>  
> -    def wait_for_port(self, timeout=None):
> +    def wait_for_port(self, host=None, port=None, timeout=None):

I can’t see that this is ever called with the `host` or `port` arguments?

Also I don’t think we should make this public.

::: testing/marionette/client/marionette_driver/marionette.py:647
(Diff revision 1)
>              return False
>          finally:
>              s.close()
>  
> -    def wait_for_port(self, timeout=None):
> +    def wait_for_port(self, host=None, port=None, timeout=None):
> +        """Wait for the specified host/port to become available."""

This documentation is wrong now that it picks up `self.host` and `self.port`.

::: testing/marionette/client/marionette_driver/marionette.py:648
(Diff revision 1)
> +        host = host or self.host
> +        port = port or self.port

I’m not a fan of this `or` notation.  If the first variable is falsy, this will select the second variable.

So for example with `wait_for_port(port=0)`, 0 will evaluate to false and it will pick `self.port` instead of the passed port.

Instead we need to do:

```python
if host is None:
    host = self.host
```

::: testing/marionette/client/marionette_driver/marionette.py:653
(Diff revision 1)
> +        port = port or self.port
>          timeout = timeout or self.DEFAULT_STARTUP_TIMEOUT
> -        return transport.wait_for_port(self.host, self.port, timeout=timeout)
>  
> +        poll_interval = 0.1
> +        runner = self.instance.runner if self.instance else None

`if self.instance is not None`

::: testing/marionette/client/marionette_driver/marionette.py:658
(Diff revision 1)
> +        runner = self.instance.runner if self.instance else None
> +        starttime = datetime.datetime.now()
> +
> +        while datetime.datetime.now() - starttime < datetime.timedelta(seconds=timeout):
> +            # If the instance we want to connect to is not running return immediately
> +            if runner and not runner.is_running():

`if runner is not None`

::: testing/marionette/client/marionette_driver/marionette.py:672
(Diff revision 1)
> +                if ":" in data:
> +                    return True
> +            except socket.error:
> +                pass
> +            finally:
> +                if sock:

This if-condition is unnecessary.  `sock` is always assigned.

::: testing/marionette/client/marionette_driver/marionette.py:673
(Diff revision 1)
> +                    try:
> +                        sock.shutdown(socket.SHUT_RDWR)
> +                    except:
> +                        pass

Will this ever throw?
Comment on attachment 8793728 [details]
Bug 1290372 - wait_for_port() has to return early if instance is not running.

https://reviewboard.mozilla.org/r/80410/#review80356
Attachment #8793728 - Flags: review?(ato) → review+
Comment on attachment 8793728 [details]
Bug 1290372 - wait_for_port() has to return early if instance is not running.

https://reviewboard.mozilla.org/r/80410/#review80232

> I can’t see that this is ever called with the `host` or `port` arguments?
> 
> Also I don’t think we should make this public.

The method was already public before my changes. So it would mean we change the API. I'm happy to do so in case it is wanted, but would have to check for side-effects. Please let me know.

Regarding the options I will remove them. Now that the code is in the same file, we don't need both options anymore.

> This if-condition is unnecessary.  `sock` is always assigned.

The if condition was already present in the old code. So I kept it. Keep in mind that we can have it as `None` here if the call to `sock = socket.socket()` fails above in the try block. So I will leave it in but make it explicitly a `None` check again.

> Will this ever throw?

Yes, you cannot call `sock.shutdown()` twice on the same socket. I don't see why it should happen here, but I would keep it for safety. Just in case something else shutdown the port.
Andreas, please let me know if you like the rename of the methods. Also I would like to get the feedback for the remaining open questions. Thanks.
Flags: needinfo?(ato)
Mentor: hskupin
Whiteboard: [lang=py]
Comment on attachment 8793728 [details]
Bug 1290372 - wait_for_port() has to return early if instance is not running.

https://reviewboard.mozilla.org/r/80410/#review80232

> The method was already public before my changes. So it would mean we change the API. I'm happy to do so in case it is wanted, but would have to check for side-effects. Please let me know.
> 
> Regarding the options I will remove them. Now that the code is in the same file, we don't need both options anymore.

We are already changing the API.

> The if condition was already present in the old code. So I kept it. Keep in mind that we can have it as `None` here if the call to `sock = socket.socket()` fails above in the try block. So I will leave it in but make it explicitly a `None` check again.

A ctor in the standard library is pretty much guaranteed to never have adverse side effects FWIW.
(In reply to Henrik Skupin (:whimboo) from comment #12)
> Andreas, please let me know if you like the rename of the methods. Also I
> would like to get the feedback for the remaining open questions. Thanks.

I think we should unpublish it and by prepending a _.  I consider this to be an internal method.

Other than that, I think I’ve addressed the remaining open questions.
Flags: needinfo?(ato)
Comment on attachment 8793728 [details]
Bug 1290372 - wait_for_port() has to return early if instance is not running.

https://reviewboard.mozilla.org/r/80410/#review80232

> A ctor in the standard library is pretty much guaranteed to never have adverse side effects FWIW.

Well, what should happen if an exception is thrown in the socket's __init__ method? Then sock will still be None and we will fail with the if check in the finally block.
(In reply to Andreas Tolfsen ‹:ato› from comment #14)
> I think we should unpublish it and by prepending a _.  I consider this to be
> an internal method.

That was actually already done with the last uploaded version of the patch. :)

So once we are on agreement for the last remaining issue, we can get this patch landed. Thanks.
Comment on attachment 8793728 [details]
Bug 1290372 - wait_for_port() has to return early if instance is not running.

https://reviewboard.mozilla.org/r/80410/#review80232

> Well, what should happen if an exception is thrown in the socket's __init__ method? Then sock will still be None and we will fail with the if check in the finally block.

This is not supposed to happen.

> Yes, you cannot call `sock.shutdown()` twice on the same socket. I don't see why it should happen here, but I would keep it for safety. Just in case something else shutdown the port.

Okay, but I guess there is a pretty remote chance of this happening since `sock` isn’t exposed.
(In reply to Henrik Skupin (:whimboo) from comment #16)
> So once we are on agreement for the last remaining issue, we can get this
> patch landed. Thanks.

I don’t think there is any disagreement.  I already gave the patch an r+.
Comment on attachment 8793728 [details]
Bug 1290372 - wait_for_port() has to return early if instance is not running.

https://reviewboard.mozilla.org/r/80410/#review80232

> Okay, but I guess there is a pretty remote chance of this happening since `sock` isn’t exposed.

I checked this more and I tend to agree now. It's also that this socket only exists for a split of a second, and no client can connect to it. So lets remove this code.
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/198c4bf0c8df
_wait_for_connection() has to return early if instance is not running. r=ato
sorry had to back this out for failures like https://treeherder.mozilla.org/logviewer.html#?job_id=5150943&repo=autoland
Flags: needinfo?(hskupin)
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a7fc516956bc
Backed out changeset 198c4bf0c8df for failing wpt tests
(In reply to Carsten Book [:Tomcat] from comment #22)
> sorry had to back this out for failures like
> https://treeherder.mozilla.org/logviewer.html#?job_id=5150943&repo=autoland

It looks like maybe even other harnesses could make use of the wait_for_port() method. Making it private might not have been a good choice. So I will get this reverted to keep API compatibility. At the same time I will also rename the method back to its original name.
Flags: needinfo?(hskupin)
I triggered another try build with wp-tests included:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7bfddb301448
Andreas, the latest changes look better now for wp-tests. I'm also more confident with the current state as before. Can you please have another look at it? Sadly I cannot re-request r? from you.
Flags: needinfo?(ato)
r+
Flags: needinfo?(ato)
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/dabadf63cb44
wait_for_port() has to return early if instance is not running. r=ato
https://hg.mozilla.org/mozilla-central/rev/dabadf63cb44
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Duplicate of this bug: 1275013
You need to log in before you can comment on or make changes to this bug.