Closed Bug 1141483 Opened 9 years ago Closed 8 years ago

After a restart Marionette doesn't restore the previous context

Categories

(Remote Protocol :: Marionette, defect)

defect
Not set
normal

Tracking

(firefox52 fixed)

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

When using marionette.restart() in a test it is not possible to work with additional windows beside the already open browser window. The case below shows it for the about window, which we need for update tests.

    def test_chrome_windows_after_restart(self):
        def callback(mn):
            print '*** len(chrome_window_handles): %s' % len(mn.chrome_window_handles)
            return len(mn.chrome_window_handles) == 2

        def open():
            with self.marionette.using_context('chrome'):
                menu = self.marionette.find_element(By.ID, 'aboutName')
                menu.click()

            Wait(self.marionette).until(callback)

        open()
        self.marionette.restart(in_app=False)
        open()

Here Marionette will timeout inside the 2nd open() call, because only a single chrome window is found, even the about window has been opened.

This is blocking our update tests, and we should get this fixed as soon as possible.
(In reply to Henrik Skupin (:whimboo) from comment #0)
> When using marionette.restart() in a test it is not possible to work with
> additional windows beside the already open browser window. The case below
> shows it for the about window, which we need for update tests.
> 
>     def test_chrome_windows_after_restart(self):
>         def callback(mn):
>             print '*** len(chrome_window_handles): %s' %
> len(mn.chrome_window_handles)
>             return len(mn.chrome_window_handles) == 2
When I switch to chrome context for this line the test passes fine. I think the issue is: https://hg.mozilla.org/mozilla-central/file/56083b5a4473/testing/marionette/marionette-server.js#l406

I don't know why we have that check. Offhand I don't think it makes sense, but then again if we're working with chrome window handles switching to chrome scope isn't too much of a scandal.
> 
>         def open():
>             with self.marionette.using_context('chrome'):
>                 menu = self.marionette.find_element(By.ID, 'aboutName')
>                 menu.click()
> 
>             Wait(self.marionette).until(callback)
> 
>         open()
>         self.marionette.restart(in_app=False)
>         open()
> 
> Here Marionette will timeout inside the 2nd open() call, because only a
> single chrome window is found, even the about window has been opened.
> 
> This is blocking our update tests, and we should get this fixed as soon as
> possible.
Thanks Chris. That would be at least a workaround for me for the time being. I will let you and David figure what's best here to do.
Whiteboard: [qa-automation-blocked]
One thing to note here is that our Firefox UI tests are working in chrome scope nearly all the time. Given the restart we seem to loose the scope we were in before the restart. Maybe we should re-set the scope once the application is up again and the test continues?
(In reply to Henrik Skupin (:whimboo) from comment #3)
> One thing to note here is that our Firefox UI tests are working in chrome
> scope nearly all the time. Given the restart we seem to loose the scope we
> were in before the restart. Maybe we should re-set the scope once the
> application is up again and the test continues?

Re-instating the scope after a restart seems reasonable to me.
No longer blocks: 1129843
Summary: After a restart Marionette doesn't list all chrome_window_handles → After a restart Marionette doesn't restore the previous context
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Attachment #8794797 - Flags: review?(dburns)
Attachment #8794797 - Flags: review?(dburns)
Comment on attachment 8794797 [details]
Bug 1141483 - After a restart Marionette doesn't restore the previous context.

https://reviewboard.mozilla.org/r/81094/#review80258
Attachment #8794797 - Flags: review?(dburns) → review+
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7dac104b94de
After a restart Marionette doesn't restore the previous context. r=automatedtester
https://hg.mozilla.org/mozilla-central/rev/7dac104b94de
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Backed out in https://hg.mozilla.org/mozilla-central/rev/9baec74b3db1bf005c66ae2f50bafbdb02c3be38 - basically permaorange on WinXP (green one time in https://treeherder.mozilla.org/#/jobs?repo=autoland&noautoclassify&bugfiler&fromchange=9d576d2aab809f3ad1648c2e9e864cb3804dd52f&group_state=expanded&filter-searchStr=19a5c36317f26393e00f6ae0960d0ed7fb421e85&tochange=d19b41ff1f18b530145742b0d11abbd482082f86), complicated by the fact that your push hit a Treeherder race condition that causes the push not to be shown so we started dancing around in our clownshoes when something utterly unlikely appeared to have caused it and then backing it out appeared not to have fixed it.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla52 → ---
Bah! I run tests through try a couple of times but well, by default we never run tests on XP. So it really gives you a totally wrong assumption. Then not having the landing shown in TH is indeed awful. :(

So here one of the failing jobs:

https://treeherder.mozilla.org/logviewer.html#?job_id=4181833&repo=autoland

 17:40:35    ERROR -  TEST-UNEXPECTED-ERROR | test_quit_restart.py TestQuitRestart.test_keep_context_after_restart_by_set_context | IOError: Process killed because the connection to Marionette server is lost. Check gecko.log for errors (Reason: Timed out waiting for port 2828!)

 17:40:35     INFO -  Traceback (most recent call last):

 17:40:35     INFO -    File "C:\slave\test\build\venv\lib\site-packages\marionette\marionette_test\testcases.py", line 162, in run

 17:40:35     INFO -      testMethod()

 17:40:35     INFO -    File "C:\slave\test\build\tests\marionette\tests\testing\marionette\harness\marionette\tests\unit\test_quit_restart.py", line 136, in test_keep_context_after_restart_by_set_context

 17:40:35     INFO -      self.marionette.restart()

 17:40:35     INFO -    File "C:\slave\test\build\venv\lib\site-packages\marionette_driver\decorators.py", line 42, in _

 17:40:35     INFO -      return func(*args, **kwargs)

 17:40:35     INFO -    File "C:\slave\test\build\venv\lib\site-packages\marionette_driver\marionette.py", line 1138, in restart

 17:40:35     INFO -      self.raise_for_port(self.wait_for_port())

 17:40:35     INFO -    File "C:\slave\test\build\venv\lib\site-packages\marionette_driver\decorators.py", line 55, in _

 17:40:35     INFO -      m.force_shutdown()

 17:40:35     INFO -    File "C:\slave\test\build\venv\lib\site-packages\marionette_driver\decorators.py", line 42, in _

 17:40:35     INFO -      return func(*args, **kwargs)

 17:40:35     INFO -    File "C:\slave\test\build\venv\lib\site-packages\marionette_driver\marionette.py", line 652, in raise_for_port

17:40:35 INFO - raise socket.timeout("Timed out waiting for port {}!".format(self.port)) 


Sadly only opt builds are failing so it makes it hard to get some more information. Waybe it helps to know the order of tests as run:

16:53:37     INFO -  TEST-START | test_quit_restart.py TestQuitRestart.test_force_quit
16:53:37     INFO -  Application command: C:\slave\test\build\application\firefox\firefox.exe -no-remote -marionette -profile c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpvbq_zv.mozrunner
16:53:40     INFO -  TEST-PASS | test_quit_restart.py TestQuitRestart.test_force_quit | took 2664ms
16:53:40     INFO -  TEST-START | test_quit_restart.py TestQuitRestart.test_force_restart
16:53:40     INFO -  Application command: C:\slave\test\build\application\firefox\firefox.exe -no-remote -marionette -profile c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpvbq_zv.mozrunner
16:53:42     INFO -  TEST-PASS | test_quit_restart.py TestQuitRestart.test_force_restart | took 2134ms
16:53:42     INFO -  TEST-START | test_quit_restart.py TestQuitRestart.test_in_app_clean_restart
16:53:42     INFO -  TEST-PASS | test_quit_restart.py TestQuitRestart.test_in_app_clean_restart | took 93ms
16:53:42     INFO -  TEST-START | test_quit_restart.py TestQuitRestart.test_in_app_quit
16:53:43     INFO -  Application command: C:\slave\test\build\application\firefox\firefox.exe -no-remote -marionette -profile c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpwfxvoh.mozrunner
16:53:45     INFO -  TEST-PASS | test_quit_restart.py TestQuitRestart.test_in_app_quit | took 3178ms
16:53:45     INFO -  TEST-START | test_quit_restart.py TestQuitRestart.test_in_app_quit_with_callback
16:53:46     INFO -  Application command: C:\slave\test\build\application\firefox\firefox.exe -no-remote -marionette -profile c:\docume~1\cltbld~1.t-x\locals~1\temp\tmppxdvox.mozrunner
16:53:48     INFO -  TEST-PASS | test_quit_restart.py TestQuitRestart.test_in_app_quit_with_callback | took 3209ms
16:53:48     INFO -  TEST-START | test_quit_restart.py TestQuitRestart.test_in_app_restart
16:53:51     INFO -  TEST-PASS | test_quit_restart.py TestQuitRestart.test_in_app_restart | took 2601ms
16:53:51     INFO -  TEST-START | test_quit_restart.py TestQuitRestart.test_in_app_restart_with_callback
16:53:53     INFO -  TEST-PASS | test_quit_restart.py TestQuitRestart.test_in_app_restart_with_callback | took 2383ms
16:53:53     INFO -  TEST-START | test_quit_restart.py TestQuitRestart.test_keep_context_after_restart_by_set_context
16:53:54     INFO -  Application command: C:\slave\test\build\application\firefox\firefox.exe -no-remote -marionette -profile c:\docume~1\cltbld~1.t-x\locals~1\temp\tmppxdvox.mozrunner
I will retrigger some jobs on Try specifically for Windows XP in the hope that I can reproduce it even with a debug build. Otherwise I will push a change which enables tracing for Marionette.
The try build which I pushed before I headed on vacation was somewhat busted and crashed over and over. So it was not that helpful:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8c9a92182a72&selectedJob=28256787

I pushed another one right now after a rebase to current code on central, which will hopefully not show the same amount of crashes.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=26c40496416f7af5737a6b815295205bc5fb2d3c
Status: REOPENED → ASSIGNED
Here the list of test results for the time period the changes were active in autoland:

https://treeherder.mozilla.org/#/jobs?repo=autoland&fromchange=7dac104b94dec3fa419b45683a9c76e7254896b0&tochange=9baec74b3db1bf005c66ae2f50bafbdb02c3be38&selectedJob=4232261&filter-searchStr=windows%20xp%20Mn

The massive crashes seem to be identical to the ones I see on bug 1299216. So waiting for a reply from Aaron before I continue to investigate the remaining problem.
I was finally able to reproduce the issue locally on WinXP. When running this test it looks like that we run into a race condition which tries to open Firefox again before the old instance has been closed. As result the following message is shown:

Firefox is already running, but is not responding. To open a new window, you must first close the existing Firefox process, or restart your system.
So the problem here is actually not the test itself but caused by a test as run before in that module. I cannot correctly isolate it but I feel it is exactly the same race condition as I noticed yesterday on bug 1309556. Local tests didn't show this problem anymore when I added a sleep before the client tries calls start_session().
Depends on: 1309556
Comment on attachment 8794797 [details]
Bug 1141483 - After a restart Marionette doesn't restore the previous context.

https://reviewboard.mozilla.org/r/81092/#review85524

Latest update of the patch only contains changes as coming in from a rebase to latest mozilla-central. No further changes have been made. A try build has been issued: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f9219de91d3d
Blocks: 1260502
Comment on attachment 8794797 [details]
Bug 1141483 - After a restart Marionette doesn't restore the previous context.

As turned out the hard-restarts seem to have issues on XP. Switching those to in_app restarts in the test, make those work for Mn on XP:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6de7ae61d5aa

The listed failure for Linux in that above try build is because I missed to update the tests regarding the expected process id after the restart. A change similar to the other in_app restart tests was necessary. The following try proves that it fixed the issue:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa4ce3e2b8ebb07253d52af6e553beccfdfd5216

Given that there were a couple changes since the original review, I would like to have another one. Thanks.
Attachment #8794797 - Flags: review?(dburns)
Comment on attachment 8794797 [details]
Bug 1141483 - After a restart Marionette doesn't restore the previous context.

https://reviewboard.mozilla.org/r/81094/#review87702
Attachment #8794797 - Flags: review?(dburns) → review+
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/85e036aafe99
After a restart Marionette doesn't restore the previous context. r=automatedtester
https://hg.mozilla.org/mozilla-central/rev/85e036aafe99
Status: ASSIGNED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: