Application disconnect with 'client process shutdown unsuccessful'

RESOLVED FIXED

Status

defect
P1
critical
RESOLVED FIXED
7 years ago
3 years ago

People

(Reporter: whimboo, Assigned: whimboo)

Tracking

unspecified
x86_64
macOS
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [mozmill-2.1+])

Attachments

(1 attachment, 2 obsolete attachments)

I got this today by running a simple test like:

function test() {
  expect.fail();
}

Mozmill hangs and didn't shutdown cleanly. No way so far to reproduce.

TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed
RESULTS | Passed: 0
RESULTS | Failed: 1
RESULTS | Skipped: 0
Traceback (most recent call last):
  File "/Volumes/data/code/mozmill/mozmill/mozmill/__init__.py", line 751, in run
    mozmill.run(tests, self.options.restart)
  File "/Volumes/data/code/mozmill/mozmill/mozmill/__init__.py", line 414, in run
    self.stop_runner()
  File "/Volumes/data/code/mozmill/mozmill/mozmill/__init__.py", line 495, in stop_runner
    raise Exception('client process shutdown unsuccessful')
Steps to reproduce:

1. Download a broken l10n nightly build which raises a yellow-screen-of-death window from here: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/01/2013-01-24-05-41-58-mozilla-central-l10n/

2. Clone our mozmill tests repository 

3. Execute mozmill like: mozmill -b /Volumes/Nightly/FirefoxNightly.app/Contents/MacOS/firefox -t tests/functional/testToolbar/

Mozmill 2.0 is at least able to kill the application what Mozmill 1.5 cannot do. This is a huge issue for us on hotfix-1.5. But given that the process handling has been mostly rewritten for Mozmill 2, I will file a new bug for the 1.5 case.

Comment 2

6 years ago
Need to investigate this further to see if it is still an issue with mozmill 2.x codebase.  Potentially looking at this for 2.1 if the investigation proves that it is still in the current codebase.
Whiteboard: [mozmill-2.0?] → [mozmill-2.1?]
Posted file testcase (obsolete) —
Another testcase for such a condition. We do not handle the modal dialog and Mozmill is not able to kill the Firefox process.
The problem here is that we are trying to close Firefox through the bridge which might exist anymore or the execution of JS code (inside of stop_runner) is blocked due to a modal dialog. So the following code will fail to correctly close the browser:

                    if not self.shutdownMode:
                        # In case of an unexpected shutdown stop the runner
                        self.report_disconnect()
                        self.stop_runner()

Instead of calling self.stop_runner() we should kill the application instance and re-initialize the connection for the next test. With that done we could even resume our tests and do no longer abort the whole testrun.

Clint, what do you think?
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Flags: needinfo?(ctalbert)

Comment 5

6 years ago
(In reply to Henrik Skupin (:whimboo) from comment #4)
> The problem here is that we are trying to close Firefox through the bridge
> which might exist anymore or the execution of JS code (inside of
> stop_runner) is blocked due to a modal dialog. So the following code will
> fail to correctly close the browser:
> 
>                     if not self.shutdownMode:
>                         # In case of an unexpected shutdown stop the runner
>                         self.report_disconnect()
>                         self.stop_runner()
> 
> Instead of calling self.stop_runner() we should kill the application
> instance and re-initialize the connection for the next test. With that done
> we could even resume our tests and do no longer abort the whole testrun.
> 
> Clint, what do you think?
I think that sounds like a good idea. Let's give it a shot. The only thing to ensure is that you clear and close the connection(s) open on the JSBridge port. Alternatively, for a restart test you could restart with a different jsbridge port and allow the system to clean up the unused port. For instance for test 1 you would start with port 2424 and for test 2 you would restart with port 2425 and for the next test you'd go back to 2424 (by then the OS should have released the port). But that's kind of hacky, so let's just try to do this and ensure the connection is closed and destroyed before starting the next test.
Flags: needinfo?(ctalbert)
Mozmill 2.0.4 will get a couple of fixes regarding handling the application process during restarts. We are kinda broken there. So I would take this into account too.
Blocks: 960495
Whiteboard: [mozmill-2.1?] → [mozmill-2.0.4?]
Posted patch bug-794020.patch (obsolete) — Splinter Review
This works for me using the given testcase. If it fails on other cases, please let me know.

Thanks
Attachment #8362678 - Flags: review?(hskupin)
Comment on attachment 8362678 [details] [diff] [review]
bug-794020.patch

Review of attachment 8362678 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozmill/mozmill/__init__.py
@@ +419,4 @@
>                      if not self.shutdownMode:
>                          # In case of an unexpected shutdown stop the runner
>                          self.report_disconnect()
> +                        self.runner.stop()

This might work, but we could have some inconsistencies with e.g. self.shutdownMode which we don't correctly reset. I think that needs more thinking about a better handling of shutdown code and most likely a bit of refactoring. Also a mutt test would be good to have.

Given that parts are under works by myself on bug 959551 we might want to wait here until the other bug has been fixed.
Attachment #8362678 - Flags: review?(hskupin) → review-
Depends on: 959551
Ok, suits me. Just let me know, once you think it is a good time to try and fix this bug.

Thx for looking at it.
Blocks: 962514
Frankly I'm blocked on bug 947248 now, which I have to fix first. :/
Depends on: 947248
As it has been turned out today this bug is not that easy to fix. The fix for mozprocess will take a bit longer, which is time we do not have. Also this bug exists for along time and doesn't happen that often. So lets move it to the next release 2.0.5.
Blocks: 965216
No longer blocks: 960495, 962514
Whiteboard: [mozmill-2.0.4?] → [mozmill-2.0.5?]
I don't want to take this for 2.0.5. Lets put this into the next feature release, which might follow next.
Whiteboard: [mozmill-2.0.5?] → [mozmill-2.1?]
Blocks: 960421
No longer blocks: 965216
Duplicate of this bug: 659453
Blocks: 970594

Comment 14

5 years ago
We've had 1 failure in production on OSX.
The OSX issue should have been fixed in 2.0.4 if I'm not mistaken.

Still, here is the fail for reference:
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/release-mozilla-release_functional/1481/console
http://mozmill-release.blargon7.com/#/functional/report/e35f22a68fec3f6d144713f9abad45c4
No, as I have written and also said in the ask an expert meeting this is a differnet issue, where most likely the master password dialog popped up for the password manager test, and blocked the whole mozmill thread. Please get this investigated and check if the master password test really resets the set master password!
Blocks: 974928
We want to have this in the next release of Mozmill.
Priority: -- → P1
Whiteboard: [mozmill-2.1?] → [mozmill-2.1+]
Blocks: 761605
Attachment #8362678 - Attachment is obsolete: true
Posted patch Patch v1Splinter Review
In case of a modal dialog blocking us from a shutdown, we should hard-kill the browser. As of now we try to kill it via the JS bridge, which is not successful when the connection already timed out.
Attachment #8356022 - Attachment is obsolete: true
Attachment #8538423 - Flags: review?(ahalberstadt)
Comment on attachment 8538423 [details] [diff] [review]
Patch v1

Review of attachment 8538423 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozmill/mozmill/__init__.py
@@ +476,2 @@
>              if frame:
>                  self.stop_runner()

Do we need both self.stop_runner and self.kill_runner? What's the difference between the two?
Attachment #8538423 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8538423 [details] [diff] [review]
Patch v1

Review of attachment 8538423 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozmill/mozmill/__init__.py
@@ +476,2 @@
>              if frame:
>                  self.stop_runner()

stop() is doing a safe shutdown of the application, by also checking if additional application information has already been retrieved. If not, it's doing it. kill() is really hard-killing the process in case of the application doesn't respond.

I implemented this new method to not change any existent code-flow of Mozmill. Simply to reduce any kind of regression, we could introduce. I really want it to be the last release.
https://github.com/mozilla/mozmill/commit/23519d8a084346dbff3b2e338c1f3872075ec4e9 (master)
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Something was broken with the landing. I reverted the above changeset and re-pushed with a proper commit message:

https://github.com/mozilla/mozmill/compare/23519d8a0843...3c9bfcbf559f
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.