Closed
Bug 1166033
Opened 9 years ago
Closed 9 years ago
Marionette doesn't recognize when the application dies, waits for the socket timeout, and doesn't start application again for following tests
Categories
(Remote Protocol :: Marionette, defect)
Remote Protocol
Marionette
Tracking
(firefox41 fixed)
VERIFIED
FIXED
mozilla41
Tracking | Status | |
---|---|---|
firefox41 | --- | fixed |
People
(Reporter: whimboo, Assigned: jgriffin)
References
Details
Attachments
(1 file, 1 obsolete file)
While doing some tests with our Mozmill CI and getting the Firefox UI tests running, I noticed that Marionette has a somewhat strange behavior if you interact with the application during the testrun. E.g closing the application (which can also happen by a crash) seems to cause Marionette to wait for the global timeout. This is not necessary because mozrunner should tell us that the application has been closed. With a default global socket timeout of about 6min (or so?) we can safe a lot of time when running tests, and some unexpected behavior happens with the application.
Reporter | ||
Updated•9 years ago
|
Version: 38 Branch → unspecified
Reporter | ||
Comment 1•9 years ago
|
||
Problem here is that if the application gets closed somehow, Marionette doesn't handle it correctly at all. It even doesn't start a new instance of the application before continuing with the next tests! Here an example when the application gets closed in the middle of a run: [..] File "/mozilla/code/firefox/nightly/testing/marionette/transport/marionette_transport/transport.py", line 99, in send raise IOError("%s: %s" % (str(e), self.connection_lost_msg)) IOError: [Errno 32] Broken pipe: Connection to Marionette server is lost. Check gecko.log (desktop firefox) or logcat (b2g) for errors. Now it takes a while until Marionette continuous with the next test. But then no new instance of Firefox gets started and I get: [..] self.connect() File "/mozilla/code/firefox/nightly/testing/marionette/transport/marionette_transport/transport.py", line 69, in connect self.sock.connect((self.addr, self.port)) File "/usr/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) error: [Errno 111] Connection refused We should handle the application status more carefully and should not only rely on the socket connection and its timeout. Jonathan, you mentioned that you might be able to work on this issue. Is it still true?
Severity: normal → major
Flags: needinfo?(jgriffin)
Summary: If application closes Marionette should not wait for global timeout → Marionette doesn't recognize when the application dies, waits for the socket timeout, and doesn't start application again for following tests
Assignee | ||
Comment 2•9 years ago
|
||
Yes, I'll take this one.
Assignee: nobody → jgriffin
Status: NEW → ASSIGNED
Flags: needinfo?(jgriffin)
Assignee | ||
Comment 3•9 years ago
|
||
Do you have a test handy that illustrates this problem?
Flags: needinfo?(hskupin)
Assignee | ||
Comment 4•9 years ago
|
||
Note that handling this in case of a crash is different than handling the normal shutdown case; I don't expect to tackle the former, at least this week.
Comment 5•9 years ago
|
||
Is there a way that we can recreate the issue? (In reply to Henrik Skupin (:whimboo) from comment #1) > Problem here is that if the application gets closed somehow, Marionette > doesn't handle it correctly at all. It even doesn't start a new instance of > the application before continuing with the next tests! > > Here an example when the application gets closed in the middle of a run: > > [..] > > File > "/mozilla/code/firefox/nightly/testing/marionette/transport/ > marionette_transport/transport.py", line 99, in send > raise IOError("%s: %s" % (str(e), self.connection_lost_msg)) > > IOError: [Errno 32] Broken pipe: Connection to Marionette server is lost. > Check gecko.log (desktop firefox) or logcat (b2g) for errors. > Is this due to a crash or? > Now it takes a while until Marionette continuous with the next test. But > then no new instance of Firefox gets started and I get: > > [..] > > self.connect() > File > "/mozilla/code/firefox/nightly/testing/marionette/transport/ > marionette_transport/transport.py", line 69, in connect > self.sock.connect((self.addr, self.port)) > File "/usr/lib/python2.7/socket.py", line 228, in meth > return getattr(self._sock,name)(*args) > error: [Errno 111] Connection refused > > We should handle the application status more carefully and should not only > rely on the socket connection and its timeout. Does a new firefox appear between test files?
Reporter | ||
Comment 6•9 years ago
|
||
(In reply to David Burns :automatedtester from comment #5) > Is there a way that we can recreate the issue? Just run the firefox ui tests like "firefox-ui-tests --binary %path%". At some point simply close the application via the x button. > > IOError: [Errno 32] Broken pipe: Connection to Marionette server is lost. > > Check gecko.log (desktop firefox) or logcat (b2g) for errors. > > > > Is this due to a crash or? Nope, just closing the application. > > We should handle the application status more carefully and should not only > > rely on the socket connection and its timeout. > > Does a new firefox appear between test files? As mentioned in the beginning of that comment there is no new instance of the application coming up. Marionette still thinks the original instance is open.
Flags: needinfo?(hskupin)
Assignee | ||
Comment 7•9 years ago
|
||
If this bug primarily concerns behavior that occurs when manually closing Firefox during a test run (as opposed to closing it programmatically) I would downgrade its importance. None of our frameworks would handle this situation gracefully, and although I can imagine that it might be useful when debugging things locally, it doesn't seem to warrant a 'major' designation. I'll still look at it, however.
Reporter | ||
Comment 8•9 years ago
|
||
I feel that it can happen in any situatino the application gets closed unexpectedly. What I could do is to convert one of our Mozmill tests which triggers a crash of the application. It could help to let us understand the problem better.
Assignee | ||
Comment 9•9 years ago
|
||
WIP. This reworks the socket code to throw an exception quickly after the process dies, rather than having to wait for a socket.timeout. We'll need to catch this in the runner in order to restart the process, and probably needs some other things as well.
Reporter | ||
Comment 10•9 years ago
|
||
CC'ing Armen here given that I have seen it for update tests too, and what I read on bug 1148546 comment 113 he had some problem with the application not shutting down. Not sure how long he waited but it could be related to this bug.
Assignee | ||
Comment 11•9 years ago
|
||
Bug 1166033 - Notice when the process has died, and restart it
Assignee | ||
Updated•9 years ago
|
Attachment #8613152 -
Attachment is obsolete: true
Assignee | ||
Comment 12•9 years ago
|
||
I have a patch which works well locally; going to run through try before requesting review.
Assignee | ||
Comment 13•9 years ago
|
||
if this looks ok, I'll schedule a larger run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=dc748a4392b8
Reporter | ||
Comment 14•9 years ago
|
||
It looks lovely but sadly I cannot test because the tryserver run did not create linux64 binaries. Please let me know once you created those and I will test with our Firefox UI Tests.
Assignee | ||
Comment 15•9 years ago
|
||
Bigger try run, which will include linux64 builds: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0da1af167fb8
Assignee | ||
Comment 16•9 years ago
|
||
Comment on attachment 8614357 [details] MozReview Request: Bug 1166033 - Notice when the process has died, and restart it, r=dburns Bug 1166033 - Notice when the process has died, and restart it
Attachment #8614357 -
Flags: review?(dburns)
Reporter | ||
Comment 17•9 years ago
|
||
I tested this build by closing Firefox manually while tests were running and we still do not start a new instance of the application. At least not under Linux 64bit. For the next test I see the following exception: Traceback (most recent call last): File "/home/henrik/.virtualenvs/fxtests/local/lib/python2.7/site-packages/marionette_client-0.12-py2.7.egg/marionette/marionette_test.py", line 277, in run self.setUp() File "/mozilla/code/firefox-ui-tests/firefox_puppeteer/tests/test_places.py", line 13, in setUp FirefoxTestCase.setUp(self) File "/mozilla/code/firefox-ui-tests/firefox_ui_harness/testcases/base.py", line 42, in setUp MarionetteTestCase.setUp(self, *args, **kwargs) File "/home/henrik/.virtualenvs/fxtests/local/lib/python2.7/site-packages/marionette_client-0.12-py2.7.egg/marionette/marionette_test.py", line 641, in setUp CommonTestCase.setUp(self) File "/home/henrik/.virtualenvs/fxtests/local/lib/python2.7/site-packages/marionette_client-0.12-py2.7.egg/marionette/marionette_test.py", line 410, in setUp self.marionette.start_session() File "/home/henrik/.virtualenvs/fxtests/local/lib/python2.7/site-packages/marionette_driver-0.7-py2.7.egg/marionette_driver/marionette.py", line 951, in start_session self.session = self._send_message('newSession', 'value', capabilities=desired_capabilities, sessionId=session_id) File "/home/henrik/.virtualenvs/fxtests/local/lib/python2.7/site-packages/marionette_driver-0.7-py2.7.egg/marionette_driver/decorators.py", line 36, in _ return func(*args, **kwargs) File "/home/henrik/.virtualenvs/fxtests/local/lib/python2.7/site-packages/marionette_driver-0.7-py2.7.egg/marionette_driver/marionette.py", line 687, in _send_message response = self.client.send(message) File "/mozilla/code/firefox/nightly/testing/marionette/transport/marionette_transport/transport.py", line 87, in send self.connect() File "/mozilla/code/firefox/nightly/testing/marionette/transport/marionette_transport/transport.py", line 69, in connect self.sock.connect((self.addr, self.port)) File "/usr/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) error: [Errno 111] Connection refused Looks like the socket which was in use before has not been freed up.
Comment 18•9 years ago
|
||
I will look at the review after Jonathan has had a look.
Assignee | ||
Comment 19•9 years ago
|
||
Hey Henrik, looking at that stack trace, it doesn't look like my patch was applied; it looks like it's still using the previous version.
Reporter | ||
Comment 20•9 years ago
|
||
But that test was done with the build which I downloaded from try: https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jgriffin@mozilla.com-0da1af167fb8/try-linux64/firefox-41.0a1.en-US.linux-x86_64.tar.bz2
Assignee | ||
Comment 21•9 years ago
|
||
This change isn't in the build at all, but in the marionette-transport and marionette-driver packages, so you'd need to download the tests.zip from that build and use those packages.
Reporter | ||
Comment 22•9 years ago
|
||
Oh darn. Sorry for that, you are totally right. So I applied your patch in my recent inbound checkout and didn't notice any difference. Both packages are used for real: >>> import marionette_driver >>> marionette_driver.__file__ '/mozilla/code/firefox/inbound/testing/marionette/driver/marionette_driver/__init__.pyc' >>> import marionette_transport >>> marionette_transport.__file__ '/mozilla/code/firefox/inbound/testing/marionette/transport/marionette_transport/__init__.pyc' Here the exceptions: File "/mozilla/code/firefox/inbound/testing/marionette/driver/marionette_driver/marionette.py", line 687, in _send_message response = self.client.send(message) File "/mozilla/code/firefox/inbound/testing/marionette/transport/marionette_transport/transport.py", line 99, in send raise IOError("%s: %s" % (str(e), self.connection_lost_msg)) IOError: [Errno 32] Broken pipe: Connection to Marionette server is lost. Check gecko.log (desktop firefox) or logcat (b2g) for errors. [..] File "/mozilla/code/firefox/inbound/testing/marionette/driver/marionette_driver/marionette.py", line 687, in _send_message response = self.client.send(message) File "/mozilla/code/firefox/inbound/testing/marionette/transport/marionette_transport/transport.py", line 103, in send response = self.receive() File "/mozilla/code/firefox/inbound/testing/marionette/transport/marionette_transport/transport.py", line 60, in receive raise IOError(self.connection_lost_msg) IOError: Connection to Marionette server is lost. Check gecko.log (desktop firefox) or logcat (b2g) for errors. [..] File "/mozilla/code/firefox/inbound/testing/marionette/driver/marionette_driver/marionette.py", line 687, in _send_message response = self.client.send(message) File "/mozilla/code/firefox/inbound/testing/marionette/transport/marionette_transport/transport.py", line 87, in send self.connect() File "/mozilla/code/firefox/inbound/testing/marionette/transport/marionette_transport/transport.py", line 69, in connect self.sock.connect((self.addr, self.port)) File "/usr/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) error: [Errno 111] Connection refused
Assignee | ||
Comment 23•9 years ago
|
||
Hey Henrik, This is somehow still using the old version. You can see at https://reviewboard.mozilla.org/r/9921/diff/1/#0, using this line from the stack trace: line 87, in send self.connect() File "/mozilla/code/firefox/inbound/testing/marionette/transport/marionette_transport/transport.py", That code is at that line in the original code, but not with my patch; with my patch that line happens to be: self.sock = None
Reporter | ||
Comment 24•9 years ago
|
||
Ok, not sure what really went wrong here but now I created a new venv and the only thing I can say is: AWESOME improvements!! That's exactly the behavior the harness should have when the application under test dies. Great work at least from the functional standpoint. David, nothing should block a code-review anymore.
Comment 25•9 years ago
|
||
Comment on attachment 8614357 [details] MozReview Request: Bug 1166033 - Notice when the process has died, and restart it, r=dburns https://reviewboard.mozilla.org/r/9921/#review9097 Ship It!
Attachment #8614357 -
Flags: review?(dburns) → review+
Assignee | ||
Comment 26•9 years ago
|
||
big try run to make sure this doesn't mess anything up: https://treeherder.mozilla.org/#/jobs?repo=try&revision=613f713e6ea6
Assignee | ||
Comment 28•9 years ago
|
||
Backed out for OSX Mn failures in test_profile_management.py: https://treeherder.mozilla.org/logviewer.html#?job_id=10611879&repo=mozilla-inbound
Assignee | ||
Comment 29•9 years ago
|
||
What's happening is that when Firefox restarts itself, the new instance of Firefox has a new pid, and the mozrunner instance we've used to initially launch Firefox basically loses track of the new instance. I'm not sure why this only happens on OSX..I haven't looked at what happens on the other platforms. I'm also not sure how to compensate for this without being overly fragile. The best thing I can think of is to disable the behavior that this patch implements when a test calls self.marionette.restart(in_app=True).
Reporter | ||
Comment 30•9 years ago
|
||
That's how we do restarts in Mozmill: https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/__init__.py#L549 Maybe it helps?
Assignee | ||
Comment 31•9 years ago
|
||
testing a fix on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=032727be1a89
Assignee | ||
Comment 32•9 years ago
|
||
Comment on attachment 8614357 [details] MozReview Request: Bug 1166033 - Notice when the process has died, and restart it, r=dburns Bug 1166033 - Notice when the process has died, and restart it, r=dburns
Attachment #8614357 -
Attachment description: MozReview Request: Bug 1166033 - Notice when the process has died, and restart it → MozReview Request: Bug 1166033 - Notice when the process has died, and restart it, r=dburns
Attachment #8614357 -
Flags: review+ → review?(dburns)
Assignee | ||
Comment 33•9 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #32) > Comment on attachment 8614357 [details] > MozReview Request: Bug 1166033 - Notice when the process has died, and > restart it, r=dburns > > Bug 1166033 - Notice when the process has died, and restart it, r=dburns This version adds a 'detached' attribute to self.instance when we call restart(in_app=True), at which point we stop trying to tell whether the process is alive during socket transactions, since it won't be...a new process will, but we won't have a mozrunner or mozprocess instance which is attached to it. Try run is all green.
Comment 34•9 years ago
|
||
Comment on attachment 8614357 [details] MozReview Request: Bug 1166033 - Notice when the process has died, and restart it, r=dburns https://reviewboard.mozilla.org/r/9921/#review9795 Ship It!
Attachment #8614357 -
Flags: review?(dburns) → review+
Comment 36•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/4c3af36331e0
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox41:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Reporter | ||
Comment 37•9 years ago
|
||
This works indeed fine with the new releases of driver and transport. But I still see a hang when the application gets closed after it has been restarted. We may still have some wrong process references. I will file a follow-up bug for it.
Status: RESOLVED → VERIFIED
Updated•1 year ago
|
Product: Testing → Remote Protocol
You need to log in
before you can comment on or make changes to this bug.
Description
•