Closed Bug 1001322 Opened 6 years ago Closed 5 years ago

Move wait_for_port outside of marionette client

Categories

(Testing :: Marionette, defect, P2)

x86_64
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla36

People

(Reporter: zcampbell, Assigned: davehunt)

References

Details

(Whiteboard: [affects=webqa][affects=b2g])

Attachments

(1 file, 1 obsolete file)

The marionette client has the method wait_for_port. It does not use Marionette in any way, but is inside the client so requires a session, which is impossible because the port isn't open yet.

This wait is quite useful for waiting for Gecko (marionette server started) to be ready on Firefox OS. 

If this were moved into the framework then we could use it to wait for gecko to start after crashes and so forth.
We could also just make this a class method.
(In reply to Jonathan Griffin (:jgriffin) from comment #2)
> We could also just make this a class method.

Looking at the code more closely, it looks like it also looks for a bit of the marionette protocol when waiting for the port, so keeping this a part of marionette seems like a better idea. It looks like various things (i.e. gaiatest) already use marionette's wait_for_port() so unfortunately we can't make it a classmethod/staticmethod without breaking the API. Maybe we could put a static method called wait_for_port(host, port, timeout) into marionette_transport and then make marionette's existing methods use that. It may also be worth waiting for ahal's emulator/runner refactoring (bug 997244) before landing this, as that will also change some of the logic around port waiting.
I guess this would be more accurately named `wait_for_marionette_port`

I'm totally OK with breaking gaiatest. With this I can improve gaiatest's handling of crashes. If this bug isn't fixed I'll probably dupe this method into gaiatest anyway.
My feeling is that this would be better served as a "private" method that is called when we do 

>  marionette.start_session()

This can then throw an error when it doesnt connect during the `newSession` code. I think having this as a public method on the API doesnt keep the API as clean as it should be.

I do agree with Will that this could make a great extension to the Moznetwork API since there might be other things that we want to use it for. e.g. checking a webserver is ready when starting tests in the web-platform-tests.
(In reply to David Burns :automatedtester from comment #5)
> My feeling is that this would be better served as a "private" method that is
> called when we do 
> 
> >  marionette.start_session()
> 
> This can then throw an error when it doesnt connect during the `newSession`
> code. I think having this as a public method on the API doesnt keep the API
> as clean as it should be.
> 
Sounds good, just that we should let users pass in a timeout value if they don't want the default, so something like: marionette.session(timeout=180) if they're willing to wait 3 minutes before failing.
Blocks: 967826
(In reply to Malini Das [:mdas] from comment #6)
> (In reply to David Burns :automatedtester from comment #5)
> > My feeling is that this would be better served as a "private" method that is
> > called when we do 
> > 
> > >  marionette.start_session()
> > 
> > This can then throw an error when it doesnt connect during the `newSession`
> > code. I think having this as a public method on the API doesnt keep the API
> > as clean as it should be.
> > 
> Sounds good, just that we should let users pass in a timeout value if they
> don't want the default, so something like: marionette.session(timeout=180)
> if they're willing to wait 3 minutes before failing.

I have a different idea about this now. Yes, I think that start_session should use wait_for_port, but I don't think that we should expect users to know to use "start_session" for testing if a connection is possible.

My reasoning is because as a consumer of the marionette package, if I want to test if marionette's port is active and if the server can respond, it isn't clear to use "start_session". That function name seems like it's reserved only for starting the session, not checking server status. It would be clearer to do something like I've suggested in https://bugzilla.mozilla.org/show_bug.cgi?id=1019210#c0, so the user can just do "check_marionette_responsive" instead of doing "start_session();delete_session()". The check_marionette_responsive function will be very thorough, so it will tell you if the port is up, if you can start and if you can delete sessions successfully.
Since my concerns are niche, we can drop them, but having access to wait_for_port is needed.

In response to the original description, we don't actually need a session to call wait_for_port, we can just do:
m = Marionette()
m.wait_for_port()

and this doesn't create a session, it just calls wait_for_port. The only difference is that we need to create the marionette object, since it's not a class function.

I'm fine with making this a class function, or moving it to transport.py. I'd just like to get some traction on this so people can use it without confusion.
(In reply to Malini Das [:mdas] from comment #8)

> I'm fine with making this a class function, or moving it to transport.py.
> I'd just like to get some traction on this so people can use it without
> confusion.

My vote is still to put this in the transport. We really shouldn't break the public interface of marionette if we can help it.
(In reply to Malini Das [:mdas] from comment #8)
> Since my concerns are niche, we can drop them, but having access to
> wait_for_port is needed.
> 
> In response to the original description, we don't actually need a session to
> call wait_for_port, we can just do:
> m = Marionette()
> m.wait_for_port()
> 
> and this doesn't create a session, it just calls wait_for_port. The only
> difference is that we need to create the marionette object, since it's not a
> class function.
> 

but in the context of using MarionetteTestRunner, if you call GaiaTestCase.setUp() to create the marionette instance, it immediately creates a session afterwards. It is all wrapped up inside CommonTestCase.
My emulator.py/mozrunner refactor includes a wait_for_port for device mozrunner objects.. not sure if that satisfies the initial problem or not. Also, it definitely breaks the public API.. so if my refactor isn't good enough and this requires breaking changes.. let me know and I can roll it in.
(In reply to Zac C (:zac) from comment #10)
> (In reply to Malini Das [:mdas] from comment #8)
> > Since my concerns are niche, we can drop them, but having access to
> > wait_for_port is needed.
> > 
> > In response to the original description, we don't actually need a session to
> > call wait_for_port, we can just do:
> > m = Marionette()
> > m.wait_for_port()
> > 
> > and this doesn't create a session, it just calls wait_for_port. The only
> > difference is that we need to create the marionette object, since it's not a
> > class function.
> > 
> 
> but in the context of using MarionetteTestRunner, if you call
> GaiaTestCase.setUp() to create the marionette instance, it immediately
> creates a session afterwards. It is all wrapped up inside CommonTestCase.

Right, that was the goal of setUp, to give you a working Marionette instance. The use of wait_for_port is just to test if Marionette is up. Am I missing your concern?
(In reply to Andrew Halberstadt [:ahal] from comment #11)
> My emulator.py/mozrunner refactor includes a wait_for_port for device
> mozrunner objects.. not sure if that satisfies the initial problem or not.
> Also, it definitely breaks the public API.. so if my refactor isn't good
> enough and this requires breaking changes.. let me know and I can roll it in.

discussed offline: it would be ideal to pull wait_for_port out of being an object method so it can be easily used and imported.


If we pull to transport.py, ahal said it can be removed from mozrunner, and I could remove it from marionette.py. The fallout is that we'd have to update anything that uses wait_for_port to use the new, non-object method one, and a lot of that code is out of hg trees and in random automation scripts. May cause some breakage if we miss some...
(In reply to Malini Das [:mdas] from comment #12)
> (In reply to Zac C (:zac) from comment #10)
> > (In reply to Malini Das [:mdas] from comment #8)
> > > Since my concerns are niche, we can drop them, but having access to
> > > wait_for_port is needed.
> > > 
> > > In response to the original description, we don't actually need a session to
> > > call wait_for_port, we can just do:
> > > m = Marionette()
> > > m.wait_for_port()
> > > 
> > > and this doesn't create a session, it just calls wait_for_port. The only
> > > difference is that we need to create the marionette object, since it's not a
> > > class function.
> > > 
> > 
> > but in the context of using MarionetteTestRunner, if you call
> > GaiaTestCase.setUp() to create the marionette instance, it immediately
> > creates a session afterwards. It is all wrapped up inside CommonTestCase.
> 
> Right, that was the goal of setUp, to give you a working Marionette
> instance. The use of wait_for_port is just to test if Marionette is up. Am I
> missing your concern?

Well it doesn't give us a working Marionette session then.
I guess the confusion comes from the fact that I raised this bug in order to work around the issue in the test framework but you're quite keen on sorting it all out at the test runner level now.

You can see how we are working around it here:
https://github.com/mozilla-b2g/gaia/blob/master/tests/python/gaia-ui-tests/gaiatest/gaia_test.py#L628
(In reply to Zac C (:zac) from comment #14)
> You can see how we are working around it here:
> https://github.com/mozilla-b2g/gaia/blob/master/tests/python/gaia-ui-tests/
> gaiatest/gaia_test.py#L628

Discussed in IRC, if we do what AutomatedTester suggests in Comment#5 would help out here, since InvalidResponseException was being caught when Marionette was in an intermediate state while b2g was restarted (by the device itself, during a test). If we call wait_for_port within start_session(), at least the server will be contacted after we get the expected response in wait_for_port, or will throw a clearer message about the port not being ready.

zac also mentioned adding this to delete_session()
I'd like to summarize:

a) We'd like to have wait_for_port called within start_session()/delete_session(), which should be fine.

b) It would be useful to have wait_for_port pulled out of being an object method, but this change is not backwards compatible if we remove it from mozrunner/marionette.py and it may break things.
(In reply to Malini Das [:mdas] from comment #16)
> I'd like to summarize:
> 
> a) We'd like to have wait_for_port called within
> start_session()/delete_session(), which should be fine.

Why in delete_session()? If we call delete session and it dies we just continue the clean up and move on. If there is an issue it is likely in the test runner around this area (unless I am missing something)

> 
> b) It would be useful to have wait_for_port pulled out of being an object
> method, but this change is not backwards compatible if we remove it from
> mozrunner/marionette.py and it may break things.

I am ok with having duplicate code here. Having it in marionette client means that we arent only thinking about B2G and we can make sure that the browser/device is in a fit state before carrying on.
(In reply to David Burns :automatedtester from comment #17)
> Why in delete_session()? If we call delete session and it dies we just
> continue the clean up and move on. If there is an issue it is likely in the
> test runner around this area (unless I am missing something)

I heard the suggestion from Zac, but I'm not sure exactly why it would be useful. Zac?
Flags: needinfo?(zcampbell)
The details of the solution do not concern me so much, but we just do have an outstanding bug (bug 967826) in which the tearDown can fail if the device has crashed and the objective of this bug was to enable me to solve that bug :)
Flags: needinfo?(zcampbell)
Blocks: 982158
(In reply to Zac C (:zac) from comment #19)
> The details of the solution do not concern me so much, but we just do have
> an outstanding bug (bug 967826) in which the tearDown can fail if the device
> has crashed and the objective of this bug was to enable me to solve that bug
> :)

Okay, I think adding it to start_session makes sense because we want to make sure the server is up and running before we interact with it. Adding it to delete_session is a separate concern, where you want to make sure the server hasn't crashed, and that's something that can happen at any time and during/before any method we call, so tearDown() should be modified to deal with that, not just delete_session(), so I think we're at a point where we just want to add wait_for_port to start_session, and have it available in transport.py
Priority: -- → P2
Whiteboard: [affects=webqa]
Whiteboard: [affects=webqa] → [affects=webqa][affects=b2g]
The original reason for this bug was that it would be useful to call wait_for_port without a Marionette session, but that turns out to not in fact be a requirement - it's only necessary to have an instance of a Marionette object. Does this mean the bug is invalid, or is there still value in moving the wait_for_port implementation to transport.py and calling it from start_session?

If so, is the idea to make wait_for_port a static method in transport.py and to have wait_for_port in marionette.py use the new static method? This would keep the Marionette client API identical but also allow others to use the method without an instance of Marionette.

The other suggestion was to call wait_for_port inside start_session, which would make it unnecessary for consumers to do this themselves before attempting to start a session. The only issue I see here is that wait_for_port appears to have a 5 second sleep once it has a valid response. This means that any code first waiting for the port before starting a session will essentially double this sleep to 10 seconds. Even instantiating a Marionette object with a binary path calls wait_for_port and therefore will incur this additional penalty when the session is started.

Requesting info from Zac as he raised the original bug and Malini for some suggestions for how (and if) we should proceed.
Flags: needinfo?(zcampbell)
Flags: needinfo?(mdas)
Here's a patch that does what I believe has been requested in the latest comments. My questions and needinfos from my previous comment still stand though.

Try:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=07e50f42ba6e
Assignee: nobody → dave.hunt
Status: NEW → ASSIGNED
Attachment #8516266 - Flags: feedback?(mdas)
I'm not sure that this bug hasn't been superseeded by the B2GDeviceRunner which handles crashes very well. My original interest was a very high level way of knowing when Gecko has started - at the time Marionette wasn't doing that and we were suffering a lot of b2g crashes.

A lot has changed in Marionette since, I'll have to check in marionette a bit more to see if this is still needed, keeping ni?
Dave, I'm just checking over my comments and my original need for this was to be able to call wait_for_port during GaiaTestCase setUp but before MarionetteTestCase setUp, as a way to wait for Gecko to have started up, from which point we could safely start the session. At the time I couldn't break up those steps in GaiaTestCase because MarionetteTestCase.setUp did it all.

If I recall, the crash handling was broken because MarionetteTestCase was expecting a particular exception when b2g was crashed, but someone changed the exception raised so the crash handling was never triggered. We ended up fixing it by updating the expected exception.

However I thought that waiting for the marionette port to open would be a bit more "high level" and less sensitive to framework code changes.

If you think B2GDeviceRunner has (or will) fix up that problem then this can be WONTFIX'd for my use case.
Flags: needinfo?(zcampbell)
Comment on attachment 8516266 [details] [diff] [review]
Move wait_for_port to transport.py and call when starting a new session. v1.0

Review of attachment 8516266 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, this looks great

::: testing/marionette/client/marionette/marionette.py
@@ +586,5 @@
>  
>      def wait_for_port(self, timeout=60):
> +        return MarionetteTransport.wait_for_port(self.host,
> +                                                 self.port,
> +                                                 timeout=timeout)

Nice, this won't cause downstream breakage:)

::: testing/marionette/transport/marionette_transport/transport.py
@@ +125,5 @@
> +                    return True
> +            except socket.error:
> +                pass
> +            time.sleep(1)
> +        return False

This method has changed since then, and we've removed the 5 second wait, which makes this even better for its usage in newSession
Attachment #8516266 - Flags: feedback?(mdas) → feedback+
Flags: needinfo?(mdas)
Comment on attachment 8522862 [details] [diff] [review]
Move wait_for_port to transport.py and call when starting a new session. v1.1

Try:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=5478699f6495
Attachment #8522862 - Attachment description: Move wait_for_port to transport.py and call when starting a new session. v1.0 → Move wait_for_port to transport.py and call when starting a new session. v1.1
Attachment #8522862 - Flags: review?(mdas)
Attachment #8516266 - Attachment is obsolete: true
Attachment #8522862 - Flags: review?(mdas) → review+
https://hg.mozilla.org/mozilla-central/rev/e19b12356553
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
You need to log in before you can comment on or make changes to this bug.