Closed Bug 966234 Opened 10 years ago Closed 10 years ago

[OS X] A shutdown after a restart causes Mozmill to open a new Firefox process before the former one is gone

Categories

(Testing Graveyard :: Mozmill, defect, P1)

All
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: andrei, Assigned: andrei)

References

Details

(Whiteboard: [mozmill-2.0.5+])

Attachments

(4 files, 5 obsolete files)

Attached file testcase.zip (obsolete) —
This fails in some restart tests while running a full testrun.
Affects mozmill 2.0.4 with the latest mozbase changes. 


Testcase
========
I've managed to make a testcase.
Since this only fails with mozmill-automation, the testrun is a zip file containing a mercurial repository with only the needed testcase to reproduce this issue.

To reproduce use mozmill (master) and mozbase (master).

I am seeing this on OSX (10.9) in about 2 out of 3 runs.
Unzip the testcase, then:
> testrun_functional --repository=. --report=http://mozmill-crowd.blargon7.com/db/ /Applications/FirefoxNightly.app


Short Description / Steps
=========================
We might still be able to reduce the testcase.
Basically:

1. Open AMO
2. Wait for the "transitionend" event
3. Get the AMO tab
4. waitForPageLoad() on that tab
5. Wait for the "ViewChanged" event (AMO category has loaded?)
6. call restartApplication()
7. After restart call stopApplication(true)
8. While trying to reinitialise FF for the next test we see an alert with the following message: "A copy of Firefox is already open. Only one copy of Firefox can be open at a time."
I reproduce this on Mac 10.7.5
Here I meant that I reproduce it with the testcase, though I cannot on linux only on mac.
Andrei, does this also happen on other platforms? I don't see that this has been tested yet. Have you had a change to do any further investigation here? Also you tested with master only. That is not something we want to release as 2.0.4. so what about the hotfix-2.0 brnach?
Flags: needinfo?(andrei.eftimie)
Also test mozmill directly with the -p option. You know that our scripts are re-using the same profile, which mozmill doesn't do by default. that's the only thing I could imagine why there should be a difference.
Also where is the very helpful log output? I miss this again. Please add this!(In reply to Andrei Eftimie from comment #0)
> 6. call restartApplication()
> 7. After restart call stopApplication(true)

Not sure if you have both steps in the same test. I might have to check the testcase. But as it sounds this could be an issue with the test.
Attached file log1.txt (obsolete) —
> Andrei, does this also happen on other platforms? I don't see that this has been tested yet. 
Linux works fine for me.
I'll shortly have info on Windows.

> Have you had a change to do any further investigation here?
Not yet, I have only managed to make a simplified testcase, I'm still investigating this.

> Also you tested with master only. That is not something we want to release as 2.0.4. so
> what about the hotfix-2.0 brnach?
This also fails for me on the hotfix-2.0 branch.

> Also test mozmill directly with the -p option. You know that our scripts are re-using
> the same profile, which mozmill doesn't do by default. that's the only thing I could
> imagine why there should be a difference.
Bingo. Indeed, I reproduce the failure with a set profile folder.
I think I'll be able to simplify the testcase a bit.

> Not sure if you have both steps in the same test. I might have to check the testcase.
> But as it sounds this could be an issue with the test.
I've included both methods in the same test for a 1-file testcase.
They are in different tests in the mozmill-tests repo.

> Also where is the very helpful log output? I miss this again. Please add this!
Sure. Attached is a log with --debug
I'll also try running some FF debug builds to see if there's more helpful information.
Flags: needinfo?(andrei.eftimie)
The debug log looks fine. Nothing missing in shutdown. But why are we trying to reconnect if no more test has to be executed and we called shutdown? I will have an own look in a bit.
Priority: -- → P1
I tried this testcase a dozen of times but I'm not able to reproduce the problem with latest Mozmill and mozrunner at all. Andrei are you sure that you were running the tests on the latest versions?
Flags: needinfo?(andrei.eftimie)
I would have expected that the log shows the output of the testcase run, but sadly this is for the functional testrun. In the future please make sure that we provide logs for the testcases as attached to bugs. That's more helpful.
Attached file log2.txt
New log that's based on the testcase.

(There might be some print statements in there from mozrunner and mozprocess, as I'm trying to debug them)
Attachment #8369329 - Attachment is obsolete: true
Flags: needinfo?(andrei.eftimie)
We will continue to work on IRC to be able to get this investigated faster. Details will be posted later.
Attached file log3.txt
This is not with debug.

The first instance passes, the second one fails.
In the first instance notice the print statemenets:
> mozprocess wait
> mozprocess wait. cleanup
Those were inserted here:
https://github.com/mozilla/mozbase/blob/master/mozprocess/mozprocess/processhandler.py#L153-L154
I've been playing with different VENV's with changing dependencies over the last hours, and I think this should actually not block 2.0.4

The _testcase_ fails on the hotfix-2.0 branch, but I am able to run end-to-end testruns on this branch.

Only by trying to use master if fails to complete testruns. I'm still trying to figure out which dependecy might be to blame, but since its not blocking 2.0.4 this doesn't burn us right now.

We'll probably only need to fix this for the next Mozmill release.
No longer blocks: 960495
Whiteboard: [mozmill-2.0.5+]
Priority: P1 → P2
That's good to hear. So lets get this pushed out then.
Blocks: 965216
Whiteboard: [mozmill-2.0.5+] → [mozmill-2.0.5?]
So Mozmill 2.0.4 is indeed affected by this issue. Finally I'm also able to hit this problem. I'm not sure why but we are not waiting long enough for the shutdown of Firefox, and a new instance gets opened with the same profile location, and fails. I will see if there is a workaround or if we have to release a 2.0.5 release.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
I can also see the problem without the -p option, but in those cases we do not fail given that the profile Firefox wants to access is located at different locations, and having another instance open a bit earlier doesn't harm us.
Ok, so the problem here is that in Mozmill runner.wait() returns with 0 even the process is still running. This needs further investigation if it might also be mozprocess which is wrong here. I will move the components once it is clear.
It becomes interesting...

$ ps -ef | grep firefox
  501 38633 24238   0  2:36PM ttys002    0:00.56 /Volumes/data/envs/ma2/bin/python /Volumes/data/envs/ma2/bin/mozmill -b /Applications/Firefox/Nightly.app/ -m ../mozmill-tests/nightly/firefox/tests/functional/manifest.ini -p profile

This is the grep output right before Firefox gets started with the -profile option.


henrik@whimboo-mbp:/Volumes/data/code/mozmill-automation[]$ ps -ef | grep firefox
  501 38633 24238   0  2:36PM ttys002    0:00.63 /Volumes/data/envs/ma2/bin/python /Volumes/data/envs/ma2/bin/mozmill -b /Applications/Firefox/Nightly.app/ -m ../mozmill-tests/nightly/firefox/tests/functional/manifest.ini -p profile
  501 38727 38633   0  2:36PM ttys002    0:02.12 /Applications/Firefox/Nightly.app/Contents/MacOS/firefox -foreground -profile /Volumes/data/code/mozmill/profile

Here Firefox has been started with the -profile option and has the process id 38727.


henrik@whimboo-mbp:/Volumes/data/code/mozmill-automation[]$ ps -ef | grep firefox
  501 38633 24238   0  2:36PM ttys002    0:00.64 /Volumes/data/envs/ma2/bin/python /Volumes/data/envs/ma2/bin/mozmill -b /Applications/Firefox/Nightly.app/ -m ../mozmill-tests/nightly/firefox/tests/functional/manifest.ini -p profile
  501 38727 38633   0  2:36PM ttys002    0:00.00 (firefox)
  501 38733     1   0  2:36PM ttys002    0:00.19 /Applications/Firefox/Nightly.app/Contents/MacOS/firefox

This is a couple of milliseconds later. As you can see Firefox has been re-spawned and the new process got the pid 38733. The old pid 38727 our code assumes to still working with, is a zombie process now. So polling the old pid will return 0, which indicates the process has exit. 

Exactly that is the reason why we don't wait that long enough in mozrunner because we poll the old pid now directly since bug 962495 has been landed.
So this failure is very specific for the given testcase. It seems to only fail when the discovery pane is getting used in the test. When I create a couple of restart tests with shutdowns in-between I don't see this issue. But once the pane gets loaded we seem to have a delayed shutdown.

I will strip down the testcase even further to figure out the reproduction steps.
Attached file minimized_testcase.zip
This is a minimized testcase for this problem. It also happens without loading the add-on manager. Just a restart and a shutdown.
Attachment #8368515 - Attachment is obsolete: true
(In reply to Henrik Skupin (:whimboo) from comment #18)
> Exactly that is the reason why we don't wait that long enough in mozrunner
> because we poll the old pid now directly since bug 962495 has been landed.

That's indeed a regression from bug 962495 in combination with bug 965714. So we could revert those two changes for now, until we know how to really work with the mozprocess handler in such a case. What happens at the moment is:

1. Mozmill start Firefox via --profile as PID 1
2. Mozmill runs the test and Firefox restarts itself, and the new process gets PID 2
3. We call wait() for PID 1 so that the zombie process is no longer around
4. The process handler returns 0 as returncode for PID 1
5. The process handler doesn't know about PID 2!!!
6. Mozmill runs the next test and shutdown the browser
7. We call wait() still for PID 1 and it returns immediately with 0 even with PID 2 still active
8. Mozmill starts Firefox for the next test
9. Firefox cannot use the given profile because it is still in use

I don't say that the current behavior of mozrunner is wrong but it might indeed miss that new PID and we have to figure out why.
Andrei, would you mind to test the workaround? I would like to know if that also fixes your problem.
Attachment #8370120 - Flags: feedback?(andrei.eftimie)
Something seems to be horrible broken with process_handler.proc.poll(). Those three lines have been executed right after each other in runner.wait():

** returncode from wait() call: None
** self.returncode: 0
** calling wait() again: None

So why does the returncode is 0, while wait() still returns None, and shows the process as active? I think that this is the problem here. I will file a mozbase bug so we can get this fixed in mozprocess or mozrunner.
Comment on attachment 8370120 [details] [diff] [review]
workaround just to make it work again

This problem will be gone once bug 967595 is fixed and a new mozrunner version has been released.
Attachment #8370120 - Attachment is obsolete: true
Attachment #8370120 - Flags: feedback?(andrei.eftimie)
Andrei will take this.
Assignee: hskupin → andrei.eftimie
Whiteboard: [mozmill-2.0.5?] → [mozmill-2.0.5+]
The issue has been fixed (with bug 967595) in the new mozrunner version.

Bumped mozrunner to 5.34 and added a mutt test based on the testcase from this bug to cover this issue.
Attachment #8372254 - Flags: review?(hskupin)
Comment on attachment 8372254 [details] [diff] [review]
0001-Bug-966234-Bumped-mozrunner-dep-to-5.34-and-added-a-.patch

Review of attachment 8372254 [details] [diff] [review]:
-----------------------------------------------------------------

::: mutt/mutt/tests/python/js-modules/restartTests/testRestartThenShutdown.js
@@ +1,1 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public

Please move the js files into a new subfolder. I don't want that 'restartTests' are getting polluted and we do not see the relationship of test files.

@@ +11,5 @@
> +
> +function test2() {
> +  var controller = mozmill.getBrowserController();
> +  controller.stopApplication(true);
> +}

Please use a setupTest method to get the browser controller.

::: mutt/mutt/tests/python/js-modules/restartTests/testShutdown.js
@@ +5,5 @@
> +"use strict";
> +
> +function test1() {
> +  var controller = mozmill.getBrowserController();
> +  controller.stopApplication(true);

The whole test should have only a single expect.pass() line. There is no need to call shutdown again, which will happen anyway.

::: mutt/mutt/tests/python/test_shutdown_restart.py
@@ +31,5 @@
> +
> +    def test_restart_then_shutdown(self):
> +        test_paths = [os.path.join("js-modules", "restartTests", "testRestartThenShutdown.js"),
> +                      os.path.join("js-modules", "restartTests", "testShutdown.js")]
> +        self.do_test(test_paths=test_paths, passes=3)

Looks like we misunderstood each other. Sorry for that. But this test should really be part of test_restart.py but as a separate test method.
Attachment #8372254 - Flags: review?(hskupin) → review-
- changed the test to be part of the test_restart.py file
- moved the js tests into a new folder
- fixed broken restart tests (count was wrong when some of the JS tests were skipped + 1 badly named test function
- changed test2.js to only have expect.pass()
- used setupTest to set up the controller for test1.js
Attachment #8372254 - Attachment is obsolete: true
Attachment #8372329 - Flags: review?(hskupin)
Comment on attachment 8372329 [details] [diff] [review]
0001-Bug-966234-Bumped-mozrunner-dep-to-5.34-and-added-a-.patch

Review of attachment 8372329 [details] [diff] [review]:
-----------------------------------------------------------------

::: mutt/mutt/tests/python/test_restart.py
@@ +10,5 @@
>  
>  
>  class TestRestartTests(unittest.TestCase):
>  
> +    def do_test(self, test_paths=None, manifest_path=None,

Please leave it as test_path.

@@ +46,5 @@
> +        self.do_test(test_paths=testpaths, passes=3)
> +
> +    def test_restart_then_shutdown(self):
> +        testpaths = [os.path.join("js-modules", "shutdownRestartTests", "testRestartThenShutdown.js"),
> +                     os.path.join("js-modules", "shutdownRestartTests", "testShutdown.js")]

You want to specify the folder the tests are in, or a manifest file. If you want to do the latter, one other python test module already handles that extra parameter. But for now I think the folder is fine if you rename the test files to test1.js and test2.js. Also the folder should be named like 'test_shutdownAfterRestart'.
Attachment #8372329 - Flags: review?(hskupin) → review-
Thanks for the tip.
I got it working via a manifest file.
Attachment #8372329 - Attachment is obsolete: true
Attachment #8372355 - Flags: review?(hskupin)
Comment on attachment 8372355 [details] [diff] [review]
0001-Bug-966234-Bumped-mozrunner-dep-to-5.34-and-added-a-.patch

Review of attachment 8372355 [details] [diff] [review]:
-----------------------------------------------------------------

This patch looks great and the test passes with mozrunner 5.34. I will get it landed once I slightly updated the commit message.
Attachment #8372355 - Flags: review?(hskupin) → review+
https://github.com/mozilla/mozmill/commit/4b595af93353d282debb5725056ce0e363b3062e (master)
https://github.com/mozilla/mozmill/commit/12933c1eebe184b7e1b1b98b5c06badb45d01167 (hotfix-2.0)

Thanks Andrei!
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Priority: P2 → P1
Resolution: --- → FIXED
Summary: Testrun fails with "A copy of Firefox is already open. Only one copy of Firefox can be open at a time." → A shutdown after a restart causes Mozmill to open a new Firefox process before the former one is gone
This bug was about the problem we have seen on OS X. Sadly as it looks like we now have the same issue for Windows 8 and 8.1 only. I will file a follow-up bug for it.
Summary: A shutdown after a restart causes Mozmill to open a new Firefox process before the former one is gone → [OS X] A shutdown after a restart causes Mozmill to open a new Firefox process before the former one is gone
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: