Closed Bug 951628 Opened 11 years ago Closed 11 years ago

When Firefox gets closed via shutdownApplication() we do not wait until runner process has been quit

Categories

(Testing Graveyard :: Mozmill, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: andrei, Assigned: whimboo)

References

Details

(Whiteboard: [mozmill-2.0.3+])

Attachments

(1 file)

Mozmill 2.X is failing with a Jsbridge Disconnect Error. This is intermittent but we're seeing very often. This is what we see on the staging CI server: > 04:15:39 Traceback (most recent call last): > 04:15:39 File "/Users/mozilla/mozmill-ci/jenkins-master/jobs/mozilla-central_functional/workspace/mozmill-env-mac/python-lib/mozmill_automation/testrun.py", line 349, in run > 04:15:39 self.run_tests() > 04:15:39 File "/Users/mozilla/mozmill-ci/jenkins-master/jobs/mozilla-central_functional/workspace/mozmill-env-mac/python-lib/mozmill_automation/testrun.py", line 573, in run_tests > 04:15:39 TestRun.run_tests(self) > 04:15:39 File "/Users/mozilla/mozmill-ci/jenkins-master/jobs/mozilla-central_functional/workspace/mozmill-env-mac/python-lib/mozmill_automation/testrun.py", line 300, in run_tests > 04:15:39 self._mozmill.run(tests, self.options.restart) > 04:15:39 File "/Users/mozilla/mozmill-ci/jenkins-master/jobs/mozilla-central_functional/workspace/mozmill-env-mac/python-lib/mozmill/__init__.py", line 409, in run > 04:15:39 frame = self.run_test_file(frame or self.start_runner(), > 04:15:39 File "/Users/mozilla/mozmill-ci/jenkins-master/jobs/mozilla-central_functional/workspace/mozmill-env-mac/python-lib/mozmill/__init__.py", line 326, in start_runner > 04:15:39 self.create_network() > 04:15:39 File "/Users/mozilla/mozmill-ci/jenkins-master/jobs/mozilla-central_functional/workspace/mozmill-env-mac/python-lib/mozmill/__init__.py", line 287, in create_network > 04:15:39 self.jsbridge_port) > 04:15:39 File "/Users/mozilla/mozmill-ci/jenkins-master/jobs/mozilla-central_functional/workspace/mozmill-env-mac/python-lib/jsbridge/__init__.py", line 44, in wait_and_create_network > 04:15:39 raise Exception("Cannot connect to jsbridge extension, port %s" % port) > 04:15:39 Exception: Cannot connect to jsbridge extension, port 58833 I have seen this on windows. There is always the following message: > IO Completion Port unexpectedly closed Then a notification stating: > "Firefox is already running, but is not responding. To open a new window, you must first close the existing Firefox process, or restart the system." Afterwards it fails with the jsbridge disconnect error mentioned above. We've had similar failures before. See bug 865690.
Andrei, by any chance do you have a minimized testcase? That would help me a lot to get this investigated and fixed.
Not at the moment. It looks related to restarts. I'll check with the testcase I made in bug 872414.
I will check with staging again and if it still fails I might take a os x node offline in the production cluster for a better investigation.
It might be that we indeed continue too fast in Python when the Firefox process exits. So the next call to run() will produce this.
So this is always happening for restart tests and specifically for the last test module. So I assume we somehow wrongly shutdown Firefox and are running into a timing issue. Interestingly I'm not able to reproduce this issue on any of our mac minis in mozmill-ci production. It's only happening on master in mozmill-staging. Lets see if I can debug some stuff cause it seems to always fail there.
Ok, I found the issue here, which is indeed understandable and makes total sense. Not sure why we haven't noticed that ever before! It exists since the very early days of Mozmill. So what happens here is: When we shutdown Firefox from within a test via frame.shutdownApplication(), a JSBridgeDisconnectError is thrown on the Python side: https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/__init__.py#L354 We handle that correctly but totally don't take into account that the application could not have been closed already. So we happily continue with the next test and do NOT wait until the current mozrunner process has been quit. Exactly this is causing the 'Profile already in use' disconnect we have faced a lot in the past. So a solution here is in that we have to call runner.wait() before we continue. I'm testing a patch right now and I will upload soon.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Hardware: x86 → All
Summary: Testrun fails with Disconnect Error with Firefox already running notification → When Firefox gets closed via shutdownApplication() we do not wait until runner process has been quit
Whiteboard: [mozmill-2.0.3+]
Attached patch Patch v1Splinter Review
With this patch I do not see this disconnect anymore, given that we are waiting for the process to shutdown now. Andrei please test on those machines where you have seen it.
Attachment #8349975 - Flags: review?(dave.hunt)
Attachment #8349975 - Flags: feedback?(andrei.eftimie)
Blocks: 950831
Attachment #8349975 - Flags: review?(dave.hunt) → review+
Comment on attachment 8349975 [details] [diff] [review] Patch v1 Review of attachment 8349975 [details] [diff] [review]: ----------------------------------------------------------------- Works fine for me. I can now complete a testrun without disconnects: http://mozmill-crowd.blargon7.com/#/functional/report/b646dc9797659302414b7b8a9d11e710 The fix I initially uploaded for bug 950003 now introduces another failure: http://mozmill-crowd.blargon7.com/#/functional/report/b646dc9797659302414b7b8a9d12dd67 But I highly suspect that's another problem. We don't properly handle a dialog window here. I remember seeing this dialog window remain open, but would eventually disappear when we restarted firefox later on. Its possible that now that we wait for the process to finish, we end up with a timeout becuase of the unhandled window. I'll raise another bug for this issue.
Attachment #8349975 - Flags: feedback?(andrei.eftimie) → feedback+
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: