Closed Bug 975068 Opened 11 years ago Closed 11 years ago

Show more descriptive error when Firefox doesn't restart as requested and jsbridge cannot connect

Categories

(Testing Graveyard :: Mozmill, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whimboo, Assigned: whimboo)

References

(Depends on 1 open bug)

Details

(Whiteboard: [mozmill-2.1+])

Attachments

(1 file, 11 obsolete files)

112.33 KB, patch
whimboo
: review+
Details | Diff | Splinter Review
Lately we have seen again a lot of jsbridge disconnect issues on Linux and OS X which weren't clear what it is. But as what I have seen today, Firefox doesn't restart if it should when triggered via controller.restartApplication(). It simply quit. I filed this as bug 974971. Whatever happens here, we should give a better error message.
Assignee: nobody → hskupin
Blocks: 971796
Status: NEW → ASSIGNED
Whiteboard: [mozmill-2.0.6+]
This will become actually a too massive change, so I don't want to have this part of Mozmill 2.0.6. Lets wait for it for version 2.1.
Whiteboard: [mozmill-2.0.6+] → [mozmill-2.1+]
In case of a crash it will look like: PROCESS-CRASH | /mozilla/code/mozmill/testcase/test2.js | application crashed [Unknown top frame] Crash dump filename: /home/henrik/.mozilla/firefox/1068yp1k.mozmill/minidumps/3c81d52e-03a6-efee-0641c0a1-0ade65c5.dmp No symbols path given, can't process dump. MINIDUMP_STACKWALK not set, can't process dump. mozcrash INFO | Saved minidump as /home/henrik/.mozilla/firefox/Crash Reports/pending/3c81d52e-03a6-efee-0641c0a1-0ade65c5.dmp mozcrash INFO | Saved app info as /home/henrik/.mozilla/firefox/Crash Reports/pending/3c81d52e-03a6-efee-0641c0a1-0ade65c5.extra TEST-UNEXPECTED-FAIL | Application crashed (exit code: -136) Manual close (Ctrl+Q) during startup: TEST-UNEXPECTED-FAIL | Application unexpectedly closed (exit code: -139) RESULTS | Passed: 0 RESULTS | Failed: 1 RESULTS | Skipped: 0 Manual close (Ctrl+Q) while test is run: TEST-START | /mozilla/code/mozmill/test1.js | testRestart TEST-PASS | /mozilla/code/mozmill/test1.js | testRestart (Should not happen but is another problem) TEST-END | /mozilla/code/mozmill/test1.js | finished in 3135ms TEST-UNEXPECTED-FAIL | Application unexpectedly closed (exit code: -139) RESULTS | Passed: 1 RESULTS | Failed: 1 RESULTS | Skipped: 0 Manual restart of Firefox while a test is running: TEST-UNEXPECTED-FAIL | Connection to application lost (exit code: None) TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed (Have to add exit code here too) RESULTS | Passed: 0 RESULTS | Failed: 1 RESULTS | Skipped: 0 Restart test but Firefox failed to restart and quit itself: TEST-UNEXPECTED-FAIL | Application unexpectedly closed (exit code: -27) RESULTS | Passed: 4 RESULTS | Failed: 1 RESULTS | Skipped: 0
Attached patch WIP v1 (obsolete) — Splinter Review
This is a WIP patch, which has most stuff included but is not ready yet. Andrei, it would be great if you could test various disconnect situations across platforms next week. If possible also with a debug build on one platform.
Attachment #8380150 - Flags: feedback?(andrei.eftimie)
Attached file results.txt (obsolete) —
Attached is a more complete log. I din't know how to do 2 of the tests you mentioned, mainly: > Manual restart of Firefox while a test is running: > Restart test but Firefox failed to restart and quit itself: TL;DR version: - I ran on OSX 10.9 and Windows 8.1 - I ran a debug build on OSX - I ran this with the Nightly Tester Tool for the simulated crash (so the remote/[..]/InstallAddonWithoutEula tests) - I only got (exit code: 0) and (exit code: None) on both platforms - (exit code: None) was when I manually quit _after_ a restart has been issued - with the debug build there was message at all logged during a crash
Comment on attachment 8380150 [details] [diff] [review] WIP v1 Review of attachment 8380150 [details] [diff] [review]: ----------------------------------------------------------------- As mentioned in the previous comment I got no error codes on wither OSX or Windows
Attachment #8380150 - Flags: feedback?(andrei.eftimie) → feedback-
Blocks: 970594
No longer blocks: 971796
Blocks: 994658
I'm not sure how you crashed Firefox but when I run this patch on OS X 10.9 and call one of the crash methods of NTT, I always get an exit code of 1.
Flags: needinfo?(andrei.eftimie)
I ran what I described in comment 4 and got: > TEST-UNEXPECTED-FAIL | Application crashed (exit code: 0) This with the latest Nightly Debug build. Lets get more specific steps of what I did: 1. Get Latest Nightly Debug build on OSX64 2. Apply this WIP patch on mozmill 3. Get mozmill-tests and add a long sleep in /firefox/tests/remote/restartTests/testAddons_InstallAddonWithoutEULA/test2.js (right at the beginning of the test function) 4. run: > mozmill -m firefox/tests/remote/restartTests/testAddons_InstallAddonWithoutEULA/manifest.ini -b /Applications/FirefoxNightly.app/ This installed Nightly Tester Tools in test1, and during the sleep time added in test2 go to Tools > Nightly Tester Tools > Crashme! > (Any Option, I chose `division by zero`). Result: > PROCESS-CRASH | /Users/andrei.eftimie/work/mozilla/bugs/975068/mozmill-tests/firefox/tests/remote/restartTests/testAddons_InstallAddonWithoutEULA/test2.js | application crashed [Unknown top frame] > Crash dump filename: /var/folders/9l/sn_p3bw914s360j602z20jsc0000gq/T/tmpSO_8tj.mozrunner/minidumps/036AAB70-3CF9-43DD-A07C-858ECD658691.dmp > No symbols path given, can't process dump. > MINIDUMP_STACKWALK not set, can't process dump. > mozcrash INFO | Saved dump as /Users/andrei.eftimie/Library/Application Support/Firefox/Crash Reports/pending/036AAB70-3CF9-43DD-A07C-858ECD658691.dmp > TEST-UNEXPECTED-FAIL | Application crashed (exit code: 0)
Flags: needinfo?(andrei.eftimie)
We talked on IRC and Andrei also sees other exit codes with a pre-configured profile. He will investigate why 0 was shown before, and reply back soon.
Attached file results2.txt (obsolete) —
With a preconfigured profile (Installed NTT and Restartless Restart addons) I got the attached results. tl;dr: I only got different exit codes with the NTT induced crash. The regular build generated `exit code 3` and a debug build generated `exit code -11`. Quit or Restart got 0 or None.
(In reply to Andrei Eftimie from comment #9) > With a preconfigured profile (Installed NTT and Restartless Restart addons) > I got the attached results. Looks fine to me except the case for 'Manual close (Cmd+Q) during test' for a regular build. Why does it have None as exit code? Is the process taking longer to shutdown for you? > tl;dr: Not really. :) > I only got different exit codes with the NTT induced crash. > The regular build generated `exit code 3` and a debug build generated `exit > code -11`. Debug builds may behave differently and you could even get other crash types. So everything which is not 0 or none is fine here for now. > Quit or Restart got 0 or None. For quit you should get 0, and if you do a restart during a test 'None' should be shown because we lost the connection due to the restart, and Firefox is still running. Mind checking that again?
(In reply to Henrik Skupin (:whimboo) from comment #10) > For quit you should get 0, and if you do a restart during a test 'None' > should be shown because we lost the connection due to the restart, and > Firefox is still running. > > Mind checking that again? This is correct with 1 addendum. I also get None if I quit _during_ a test. If I quit before a test starts I get 0. I always get None for a restart.
(In reply to Andrei Eftimie from comment #11) > I also get None if I quit _during_ a test. Well, it depends on how long Firefox takes to quit. You know that we lost the jsbridge connection once you are going to close Firefox. That is detected, and we want to bail out on the Python side. We check the current exit code, which might still be None. This is similar to the shutdown issue with modal dialogs open. See bug 794020 where a fix will land once this bug is done. So please check that in detail. > If I quit before a test starts I > get 0. Firefox exits fast enough, so we get a successful state when querying the exit code. > I always get None for a restart. For what kind of restart test? A single module with multiple tests included, or multiple modules executed after each other?
(In reply to Henrik Skupin (:whimboo) from comment #12) > > I always get None for a restart. > > For what kind of restart test? A single module with multiple tests included, > or multiple modules executed after each other? A single module with a long sleep. Manually triggering a restart using the Restartless Restart addon.
(In reply to Andrei Eftimie from comment #13) > A single module with a long sleep. Manually triggering a restart using the > Restartless Restart addon. Ok, a restart is fast, so I assume that Firefox already runs the new process when we check for the exit code. Or if it is slow, you might hit the same issue as above. So IMO this is the right status code to me.
Comment on attachment 8380150 [details] [diff] [review] WIP v1 Please check the remaining quit behavior, and give me feedback if it finally works. If that is the case I would like to finalize this patch. Thanks.
Attachment #8380150 - Flags: feedback- → feedback?(andrei.eftimie)
I'm not sure what the expected behaviour should be. Start a test which only has a sleep(). Cmd+Q before the test starts: > (mozmill)sl77-146:testcase andrei.eftimie$ mozmill -t test1.js --profile=../profiles/p1/ -b /Applications/FirefoxNightly.app/ > TEST-UNEXPECTED-FAIL | Application unexpectedly closed (exit code: 0) > RESULTS | Passed: 0 > RESULTS | Failed: 1 > RESULTS | Skipped: 0 Cmd+Q after the test has started: > (mozmill)sl77-146:testcase andrei.eftimie$ mozmill -t test1.js --profile=../profiles/p1/ -b /Applications/FirefoxNightly.app/ > TEST-START | /Users/andrei.eftimie/work/mozilla/bugs/975068/testcase/test1.js | test1 > TEST-PASS | /Users/andrei.eftimie/work/mozilla/bugs/975068/testcase/test1.js | test1 > TEST-END | /Users/andrei.eftimie/work/mozilla/bugs/975068/testcase/test1.js | finished in 1819ms > TEST-UNEXPECTED-FAIL | Connection to application lost (exit code: None) > RESULTS | Passed: 1 > RESULTS | Failed: 1 > RESULTS | Skipped: 0 Same results with the patch from bug 794020 applied.
(In reply to Andrei Eftimie from comment #16) > Cmd+Q before the test starts: > > (mozmill)sl77-146:testcase andrei.eftimie$ mozmill -t test1.js --profile=../profiles/p1/ -b /Applications/FirefoxNightly.app/ [..] > > TEST-UNEXPECTED-FAIL | Application unexpectedly closed (exit code: 0) > Cmd+Q after the test has started: > > TEST-UNEXPECTED-FAIL | Connection to application lost (exit code: None) Well, those are different code paths, and I don't know which internals of Firefox might cause a delayed shutdown. In the case as you describe above for quitting while a normal test is running, we might indeed not quit with None. But that certainly looks like a different bug and should not be covered here. > Same results with the patch from bug 794020 applied. That bug is not related at all here. So it will not make a difference for you.
Comment on attachment 8380150 [details] [diff] [review] WIP v1 Review of attachment 8380150 [details] [diff] [review]: ----------------------------------------------------------------- Then I assume we are good here.
Attachment #8380150 - Flags: feedback?(andrei.eftimie) → feedback+
(In reply to Henrik Skupin (:whimboo) from comment #17) > > > TEST-UNEXPECTED-FAIL | Application unexpectedly closed (exit code: 0) > > Cmd+Q after the test has started: > > > TEST-UNEXPECTED-FAIL | Connection to application lost (exit code: None) > > Well, those are different code paths, and I don't know which internals of > Firefox might cause a delayed shutdown. In the case as you describe above > for quitting while a normal test is running, we might indeed not quit with > None. But that certainly looks like a different bug and should not be > covered here. We have a lot of issues right now with all versions of Firefox and delayed shutdowns. So seeing 'None' here is not surprising. Firefox hangs during shutdown and the socket most likely has already been destroyed. We might want to kill it, but that I would cover on another bug.
Andrei, can you please re-test the patch, now that bug 1009224 has been fixed? Do you still see the None value for the exit code when you quit after the test has been started?
Flags: needinfo?(andrei.eftimie)
Retested and I haven't been able to get None as an exit code. Where this was previously the case, on OSX I get 0: > Application unexpectedly closed (exit code: 0)
Flags: needinfo?(andrei.eftimie)
Attached patch Show more descriptions v1 (obsolete) — Splinter Review
Updated patch with tests for a crash, an unexpected restart, and an unexpected quit. It's not ready yet given that i have to test on other platforms. Will do that tomorrow.
Attachment #8380150 - Attachment is obsolete: true
Attachment #8384533 - Attachment is obsolete: true
Attachment #8411673 - Attachment is obsolete: true
Attached patch Show more descriptions v2 (obsolete) — Splinter Review
Andrei, it would be great if you could check the patch on OS X. I will do on Windows. Thanks.
Attachment #8423371 - Attachment is obsolete: true
Attachment #8423728 - Flags: feedback?(andrei.eftimie)
Comment on attachment 8423728 [details] [diff] [review] Show more descriptions v2 Review of attachment 8423728 [details] [diff] [review]: ----------------------------------------------------------------- Something is wrong with the relative imports (at least on my system). I can't properly test this. I'll come back a bit later to it. ::: mozmill/mozmill/__init__.py @@ +20,4 @@ > from mozrunner.utils import get_metadata_from_egg > > import jsbridge > +from .errors import * This fails for me: > File "/Users/andrei.eftimie/work/mozilla/mozmill/src/mozmill/mozmill/__init__.py", line 23, in <module> > from .errors import * > ImportError: No module named errors I get this error when running `mutt testpy [...]`
Attachment #8423728 - Flags: feedback?(andrei.eftimie) → feedback-
Attached patch Show more descriptions v2 (obsolete) — Splinter Review
I missed to add the errors.py module to the patch. That is the reason why it is not working for you. I will test myself now.
Attachment #8423728 - Attachment is obsolete: true
Attached patch Show more descriptions v2.1 (obsolete) — Splinter Review
Last updated patch was not complete. Here the one with the last commit too. I removed the restart XPI given that we can trigger that directly via Services.startup. Also updated the tests accordingly.
Attachment #8424224 - Attachment is obsolete: true
The restart test on OS X fails because we seem to successfully waiting for the process group, and the process is gone when we check the exit code (expected: None, is: 0). That doesn't seem to happen on Linux. I will have to finish this off on Monday.
Blocks: 1035187
Attached patch Show more descriptions v2.2 (obsolete) — Splinter Review
Updated patch to latest master. Not sure why but the restart tests are failing because Firefox quits right after a restart.
Attachment #8424230 - Attachment is obsolete: true
Attached patch Show more descriptions v2.3 (obsolete) — Splinter Review
There was another small mistake, which caused those failures. By passing in the exception details to handle_disconnect() all is working fine now on Linux. I will do further tests on OS X in a bit. Andrei, could you check that patch on Windows? I would appreciate it.
Attachment #8472697 - Attachment is obsolete: true
Attachment #8472898 - Flags: feedback?(andrei.eftimie)
On Windows 8 we have 3 failures: > Python Failures: > > (<test_shutdown_delayed.TestBug1009224 testMethod=test_sleep_after_quit>, 'Traceback (most recent call last):\n > File "c:\\work\\mozmill\\mutt\\mutt\\tests\\python\\test_shutdown_delayed.py", line 25, in test_sleep_after_quit\n > results = self.do_test(testpath, persisted, passes=1, fails=1, skips=0)\n > File "c:\\work\\mozmill\\mutt\\mutt\\tests\\python\\test_shutdown_delayed.py", line 58, in do_test\n > self.assertEqual(len(results.fails), fails)\nAssertionError: 0 != 1\n') > > (<test_shutdown_delayed.TestBug1009224 testMethod=test_waitfor_after_quit>, 'Traceback (most recent call last):\n > File "c:\\work\\mozmill\\mutt\\mutt\\tests\\python\\test_shutdown_delayed.py", line 37, in test_waitfor_after_quit\n > results = self.do_test(testpath, persisted, passes=1, fails=1, skips=0)\n > File "c:\\work\\mozmill\\mutt\\mutt\\tests\\python\\test_shutdown_delayed.py", line 57, in do_test\n > self.assertEqual(len(results.passes), passes)\nAssertionError: 0 != 1\n') > > (<test_shutdown_unexpected.TestShutdownUnexpected testMethod=test_unexpected_restart>, 'Traceback (most recent call last):\n > File "c:\\work\\mozmill\\mutt\\mutt\\tests\\python\\test_shutdown_unexpected.py", line 31, in test_unexpected_restart\n > passes=0, fails=1, skips=0)\n > File "c:\\work\\mozmill\\mutt\\mutt\\tests\\python\\test_shutdown_unexpected.py", line 49, in do_test\n > m.run, [{\'path\': testpath}])\n > File "C:\\Python27\\Lib\\unittest\\case.py", line 473, in assertRaises\n > callableObj(*args, **kwargs)\n > File "c:\\work\\mozmill\\mozmill\\mozmill\\__init__.py", line 444, in run\n > self.handle_disconnect(e)\n > File "c:\\work\\mozmill\\mozmill\\mozmill\\__init__.py", line 508, in handle_disconnect\n > returncode = self.runner.wait(timeout=5.)\n > File "C:\\mozmill-env\\restart\\lib\\site-packages\\mozrunner-5.35-py2.7.egg\\mozrunner\\base.py", line 112, in wait\n > self.process_handler.wait(timeout)\n > File "C:\\mozmill-env\\restart\\lib\\site-packages\\mozprocess-0.21-py2.7.egg\\mozprocess\\processhandler.py", line 825, in wait\n > self.returncode = self.proc.wait()\n > File "C:\\mozmill-env\\restart\\lib\\site-packages\\mozprocess-0.21-py2.7.egg\\mozprocess\\processhandler.py", line 163, in wait\n > self.returncode = self._wait()\n > File"C:\\mozmill-env\\restart\\lib\\site-packages\\mozprocess-0.21-py2.7.egg\\mozprocess\\processhandler.py", line 463, in _wait\n > raise OSError(err)\nOSError: IO Completion Port failed to signal process shutdown\n') (The log is not so nicely formatted since I had to use a regular CMD as I any mozmill-env I tried using was installing any new packages in /c/Python instead of the virtualenv)
Attached patch Show more descriptions v2.3 (obsolete) — Splinter Review
Updated patch with 2 tests fixed, and one test disabled for bug 794020. Now it should pass everywhere.
Attachment #8472898 - Attachment is obsolete: true
Attachment #8472898 - Flags: feedback?(andrei.eftimie)
Attachment #8473030 - Flags: feedback?(andrei.eftimie)
Comment on attachment 8473030 [details] [diff] [review] Show more descriptions v2.3 Review of attachment 8473030 [details] [diff] [review]: ----------------------------------------------------------------- Works good. ::: mutt/mutt/tests/python/js-modules/testShutdownUnexpectedCrash.js @@ +3,5 @@ > + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > + > +"use strict"; > + > +Components.utils.import("resource://crashme/modules/Crasher.jsm"); nit: should use Cu like the other 2 js tests.
Attachment #8473030 - Flags: feedback?(andrei.eftimie) → feedback+
Attached patch Show more descriptions v2.4 (obsolete) — Splinter Review
Updated patch for the nit Andrei mentioned in his last comment. All mutt tests including the new ones are passing across platforms. Remaining work to do is marked in the tests, and will be done shortly in other already filed bugs. Andrew, it would be great if you can find the time today to get this reviewed. I will be out next week and SV would like to run a couple of tests for already known issues with Mozmill or Firefox. Thanks.
Attachment #8473030 - Attachment is obsolete: true
Attachment #8473063 - Flags: review?(ahalberstadt)
Comment on attachment 8473063 [details] [diff] [review] Show more descriptions v2.4 Review of attachment 8473063 [details] [diff] [review]: ----------------------------------------------------------------- Cool, looks good! ::: mozmill/mozmill/__init__.py @@ +524,4 @@ > # to wait on Windows because the IO completion ports feature is taking > # care of it internally. > if not mozinfo.isWin: > + returncode = self.runner.wait(timeout=1.) I see this was here before, but '1.' looks like a typo.
Attachment #8473063 - Flags: review?(ahalberstadt) → review+
(In reply to Andrew Halberstadt [:ahal] from comment #34) > ::: mozmill/mozmill/__init__.py > @@ +524,4 @@ > > # to wait on Windows because the IO completion ports feature is taking > > # care of it internally. > > if not mozinfo.isWin: > > + returncode = self.runner.wait(timeout=1.) > > I see this was here before, but '1.' looks like a typo. What do you mean with before? I simply assigned the return value to the variable in that patch.
Flags: needinfo?(ahalberstadt)
> wait(timeout=1.) The dot after the 1 is a typo. I know you didn't add this in, but it would be good if you could fix it up while you're here.
Flags: needinfo?(ahalberstadt)
Nits fixed. It were actually two of them.
Attachment #8473063 - Attachment is obsolete: true
Attachment #8473795 - Flags: review+
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Andrei, in any case if you see broken behavior with Mozmill 2.0 (wsaeinval, jsbridge disconnects) and those are reproducible, please try the same with Mozmill 2.1. It will help us to get the necessary details.
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: