Closed Bug 581733 Opened 14 years ago Closed 14 years ago

mozmill-restart can't shutdown browser if a modal dialog is open (Timeout: bridge.execFunction)

Categories

(Testing Graveyard :: Mozmill, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: whimboo, Assigned: k0scist)

Details

(Whiteboard: [mozmill-1.4.2+])

Attachments

(2 files)

With the beta 1 of Mozmill 1.4.2 installed on our Ubuntu 9.10 machine in the QA lab the browser doesn't get closed during the restart tests, i.e. installing an extension. Sometimes we hang in the modal installation dialog. I can see the following error in the console:

Timeout: bridge.execFunction("40a9a744-9739-11df-8c69-000c291ab448", bridge.registry["{23062957-73e0-4730-b6f3-fe43481f45d4}"]["cleanQuit"], [])

Haven't ever seen this before. So I would assume this is a regression introduced with the global timeout.
Do you have reproduction steps?
Isn't beta 1 just the re-factored directory structure (i.e before global timeout was added)?

Also I haven't been able to reproduce this while running the extension install/uninstall test on Ubuntu 10.04.
Whiteboard: [mozmill-1.4.2?] → [mozmill-1.4.2+]
Reproduced today while running Mozmill (latest dev build) tests against the patches mozmill-tests tests for localized builds:

TEST-START | /Volumes/data/testing/mozmill/default/firefox/testTabbedBrowsing/testBackgroundTabScrolling.js | teardownModule
Timeout: bridge.execFunction("204f65e2-9a4f-11df-845c-002332b130e8", bridge.registry["{3d0f67c4-f96f-244a-a58d-ea170984e29c}"]["runTestDirectory"], ["/Volumes/data/testing/mozmill/default/firefox"])

TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed
INFO Passed: 191
INFO Failed: 8
INFO Skipped: 8
Timeout: bridge.set("14a96214-9a50-11df-845c-002332b130e8", Components.utils.import('resource://mozmill/modules/mozmill.js'))
Well, on my local box Firefox get closed correctly. So seeing this message on qa-horus means Firefox has closed and we have that message in the logs now? WFM?
Ok, this can be related to the mystic hangs which we cover on bug 571630.

Steps:
1. Do a git pull mozauto 1.4.2 to get the latest dev version
2. Get all mozmill-tests
3. Execute Mozmill like "mozmill -b /Applications/Minefield.app/ --show-errors -t firefox/testTabbedBrowsing/"

As you can see above we hang inside of the teardownModule. It's not related to its content because even removing all the code doesn't help. We are still waiting for something and don't continue with the next test. That means after the global timeout we kill Firefox.

If this the expected behavior I'm fine with it and we should continue the discussion on bug 571630.
Assignee: nobody → jhammel
i'll take this bug if i can reproduce
Well, isn't that the exact notice we want to give when a test-run has been stopped by killing the process?
(In reply to comment #5)
> Ok, this can be related to the mystic hangs which we cover on bug 571630.
> 
> Steps:
> 1. Do a git pull mozauto 1.4.2 to get the latest dev version
> 2. Get all mozmill-tests
> 3. Execute Mozmill like "mozmill -b /Applications/Minefield.app/ --show-errors
> -t firefox/testTabbedBrowsing/"
> 
> As you can see above we hang inside of the teardownModule. It's not related to
> its content because even removing all the code doesn't help. We are still
> waiting for something and don't continue with the next test. That means after
> the global timeout we kill Firefox.
> 
> If this the expected behavior I'm fine with it and we should continue the
> discussion on bug 571630.

I don't get this hang, ubuntu 10.04, minefield from a few days ago (can update to today's version if applicable), tests and mozmill 1.4.2 from today.  Am I doing something wrong or is this system dependent?
Is there anything displayed on the failing system that might be a clue as to why it is failing?
(In reply to comment #8)
> (In reply to comment #5)
> > Ok, this can be related to the mystic hangs which we cover on bug 571630.
> > 
> > Steps:
> > 1. Do a git pull mozauto 1.4.2 to get the latest dev version
> > 2. Get all mozmill-tests
> > 3. Execute Mozmill like "mozmill -b /Applications/Minefield.app/ --show-errors
> > -t firefox/testTabbedBrowsing/"
> > 
> > As you can see above we hang inside of the teardownModule. It's not related to
> > its content because even removing all the code doesn't help. We are still
> > waiting for something and don't continue with the next test. That means after
> > the global timeout we kill Firefox.
> > 
> > If this the expected behavior I'm fine with it and we should continue the
> > discussion on bug 571630.
> 
> I don't get this hang, ubuntu 10.04, minefield from a few days ago (can update
> to today's version if applicable), tests and mozmill 1.4.2 from today.  Am I
> doing something wrong or is this system dependent?

Likewise, tried on Mac OSX and do not get this hang.
All the failures below happen with the testrun_daily.py automation script on our Windows boxes:

Windows Vista - Mozmill 1.4.2 Beta 1:

Test Failed : testInstallExtension in c:\users\mozilla\appdata\local\temp\tmpzeathw.mozmill-tests\firefox\restartTests\testExtensionInstallUninstall\test1.js
Test Failed : testInstallExtension in c:\users\mozilla\appdata\local\temp\tmpzeathw.mozmill-tests\firefox\restartTests\testExtensionInstallUninstall\test1.js
Timeout: bridge.execFunction("67063a31-9b29-11df-a1e7-000c297be8a8", bridge.registry["{f2d9e826-07de-47f1-a60a-4949be896a3b}"]["cleanQuit"], [])

2 Namoroka instances were open in parallel and there were some error dialogs that the update cannot be installed because another version is running.

Windows XP - Mozmill 1.4.2 Beta 1:

41 files updated, 0 files merged, 0 files removed, 0 files unresolved
Test Failed : testLarryGreen in c:\docume~1\mozilla\locals~1\temp\tmpqmj3d6.mozmill-tests\firefox\testSecurity\testGreenLarry.js
Test Failed : testNotificationBar in c:\docume~1\mozilla\locals~1\temp\tmpqmj3d6.mozmill-tests\firefox\testSecurity\testSafeBrowsingNotificationBar.js
Passed 224 :: Failed 2 :: Skipped 0
Sending results to 'http://brasstacks.mozilla.com/couchdb/mozmill/_design/reports/_update/report' failed (No JSON object could be decoded: line 1 column 0 (char 0)).

Timeout: bridge.execFunction("55f70ce5-9bee-11df-bfef-000c29bced40", bridge.registry["{877fa6cb-398f-438d-8c2b-05df1547ad5e}"]["runTestFile"], ["c:\\docume~1\\mozilla\\locals~1\\temp\\tmpqmj3d6.mozmill-tests\\firefox\\restartTests\\testThemeInstallUninstall\\test1.js"])

No hanging instances on this box. Everything has been shutdown cleanly.

Windows 2000 - Mozmill 1.4.2 Beta 1:

Sending results to 'http://brasstacks.mozilla.com/couchdb/mozmill/_design/reports/_update/report' failed (Extra data: line 1 column 9 - line 1 column 32 (char 9 - 32)).

Timeout: bridge.execFunction("afd94350-9bed-11df-97f3-000c29e0532e", bridge.registry["{f10f35e7-d866-4482-89ab-cc750f1320b0}"]["runTestFile"], ["c:\\docume~1\\mozilla\\locals~1\\temp\\tmpngwqdw.mozmill-tests\\firefox\\restartTests\\testMasterPassword\\test1.js"])

Hang in master password dialog is correctly handled and browser has been killed. No running instances.


So the only issue I can see here is on the Vista box with the additional message:

bridge.registry["{f2d9e826-07de-47f1-a60a-4949be896a3b}"]["cleanQuit"]
OS: Linux → All
Hardware: x86 → All
Windows 2000, XP, and Vista were hanging today in the restart test restartTests/testInstallMultipleExtensions. The hang happened in test1.js when trying to install the extensions. For one of those the timeout was too short for the download and waitForEval throws an error. That normally should close the browser but there is a modal dialog which asks you, if you want to cancel the download. In that state our test-run hung and the browser hasn't been killed.

To reproduce that the timeout for the download would simply have to be decreased to under 1s. Should be easy to reproduce.

I hope that will help you Jeff.
(In reply to comment #12)
> Windows 2000, XP, and Vista were hanging today in the restart test
> restartTests/testInstallMultipleExtensions. 

I assume you mean testMultipleExtensionInstallation?

http://hg.mozilla.org/qa/mozmill-tests/file/tip/firefox/restartTests/testMultipleExtensionInstallation
(In reply to comment #12)
> Windows 2000, XP, and Vista were hanging today in the restart test
> restartTests/testInstallMultipleExtensions. The hang happened in test1.js when
> trying to install the extensions. For one of those the timeout was too short
> for the download and waitForEval throws an error. That normally should close
> the browser but there is a modal dialog which asks you, if you want to cancel
> the download. In that state our test-run hung and the browser hasn't been
> killed.
> 
> To reproduce that the timeout for the download would simply have to be
> decreased to under 1s. Should be easy to reproduce.
> 
> I hope that will help you Jeff.

I get an addons failure using nightly minefield and latest mozmill + tests. Should I be running any particular version of FF?

$ Scripts/mozmill-restart.exe -b /c/Program\ Files/Minefield/firefox.exe -t src
/mozmill-tests/firefox/restartTests/testMultipleExtensionInstallation/ --showal
l
DEBUG | mozmill.startRunner | true
DEBUG | mozmill.setState | "currentState"
DEBUG | mozmill.setModule | "currentModule"
DEBUG | mozmill.setState | "currentState"
TEST-START | c:\mozmill\mozmill\src\mozmill-tests\firefox\restartTests\testMulti
pleExtensionInstallation\test1.js | setupModule
DEBUG | mozmill.setTest | {"name": "setupModule", "filename": "c:\\mozmill\\mozm
ill\\src\\mozmill-tests\\firefox\\restartTests\\testMultipleExtensionInstallatio
n\\test1.js"}
WARNING | setupModule | (SKIP) Bug 569813: New add-ons manager not supported yet

DEBUG | mozmill.endTest | {"passes": [], "fails": [], "skipped": true, "name": "
setupModule", "filename": "c:\\mozmill\\mozmill\\src\\mozmill-tests\\firefox\\re
startTests\\testMultipleExtensionInstallation\\test1.js", "failed": 0, "passed":
 0, "skipped_reason": "Bug 569813: New add-ons manager not supported yet"}
TEST-START | c:\mozmill\mozmill\src\mozmill-tests\firefox\restartTests\testMulti
pleExtensionInstallation\test1.js | testInstallExtensions
DEBUG | mozmill.setTest | {"name": "testInstallExtensions", "filename": "c:\\moz
mill\\mozmill\\src\\mozmill-tests\\firefox\\restartTests\\testMultipleExtensionI
nstallation\\test1.js"}
INFO | Test Skipped: "setupModule failed."
WARNING | testInstallExtensions | (SKIP) setupModule failed.
DEBUG | mozmill.endTest | {"passes": [], "fails": [], "skipped": true, "name": "
testInstallExtensions", "filename": "c:\\mozmill\\mozmill\\src\\mozmill-tests\\f
irefox\\restartTests\\testMultipleExtensionInstallation\\test1.js", "failed": 0,
 "passed": 0, "skipped_reason": "setupModule failed."}
DEBUG | mozmill.persist | {"addons": [{"url": "https://preview.addons.mozilla.or
g/firefox/addon/6543/", "name": "Nightly Tester Tools", "id": "{8620c15f-30dc-4d
ba-a131-7c5d20cf4a29}"}, {"url": "https://preview.addons.mozilla.org/firefox/add
on/5428", "name": "Mozilla QA Companion", "id": "{667e9f3d-0096-4d2b-b171-9a96af
babe20}"}]}
DEBUG | mozmill.endRunner | true
DEBUG | mozmill.startRunner | true
DEBUG | mozmill.setState | "currentState"
DEBUG | mozmill.setModule | "currentModule"
DEBUG | mozmill.setState | "currentState"
TEST-START | c:\mozmill\mozmill\src\mozmill-tests\firefox\restartTests\testMulti
pleExtensionInstallation\test2.js | setupModule
DEBUG | mozmill.setTest | {"name": "setupModule", "filename": "c:\\mozmill\\mozm
ill\\src\\mozmill-tests\\firefox\\restartTests\\testMultipleExtensionInstallatio
n\\test2.js"}
WARNING | setupModule | (SKIP) Bug 569813: New add-ons manager not supported yet

DEBUG | mozmill.endTest | {"passes": [], "fails": [], "skipped": true, "name": "
setupModule", "filename": "c:\\mozmill\\mozmill\\src\\mozmill-tests\\firefox\\re
startTests\\testMultipleExtensionInstallation\\test2.js", "failed": 0, "passed":
 0, "skipped_reason": "Bug 569813: New add-ons manager not supported yet"}
TEST-START | c:\mozmill\mozmill\src\mozmill-tests\firefox\restartTests\testMulti
pleExtensionInstallation\test2.js | testCheckInstalledExtensions
DEBUG | mozmill.setTest | {"name": "testCheckInstalledExtensions", "filename": "
c:\\mozmill\\mozmill\\src\\mozmill-tests\\firefox\\restartTests\\testMultipleExt
ensionInstallation\\test2.js"}
INFO | Test Skipped: "setupModule failed."
WARNING | testCheckInstalledExtensions | (SKIP) setupModule failed.
DEBUG | mozmill.endTest | {"passes": [], "fails": [], "skipped": true, "name": "
testCheckInstalledExtensions", "filename": "c:\\mozmill\\mozmill\\src\\mozmill-t
ests\\firefox\\restartTests\\testMultipleExtensionInstallation\\test2.js", "fail
ed": 0, "passed": 0, "skipped_reason": "setupModule failed."}
DEBUG | mozmill.persist | {"addons": [{"url": "https://preview.addons.mozilla.or
g/firefox/addon/6543/", "name": "Nightly Tester Tools", "id": "{8620c15f-30dc-4d
ba-a131-7c5d20cf4a29}"}, {"url": "https://preview.addons.mozilla.org/firefox/add
on/5428", "name": "Mozilla QA Companion", "id": "{667e9f3d-0096-4d2b-b171-9a96af
babe20}"}]}
DEBUG | mozmill.endRunner | true
INFO Passed: 0
INFO Failed: 0
INFO Skipped: 4
All add-ons test on default (Minefield) are marked as skipped, as the report says. You would have to use Namoroka or Shiretoko. But that's not really necessary. I have a much better and totally simple testcase for you.

The problem we have is, once a test is using a modal dialog which can also be a system dialog, we are failing to kill the browser when the global timeout kicks in:

var testNoKill = function() {
  controller = mozmill.getBrowserController();

  // Open the "File Open" dialog
  controller.keypress(null, "o", {accelKey: true});
}

Here the log from the command line:

TEST-START | /Volumes/data/build/tools/mozmill/mozmill/test/restart/test_user_restart/test1.js | testRestartBeforeTimeout
Timeout: bridge.execFunction("04059ae0-9f32-11df-8fbc-002332b130e8", bridge.registry["{13ba93ab-fcf6-2d41-a5c6-844d9b07cf18}"]["runTestFile"], ["/Volumes/data/build/tools/mozmill/mozmill/test/restart/test_user_restart/test1.js"])

TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed
INFO Passed: 1
INFO Failed: 0
INFO Skipped: 0
(In reply to comment #15)
> All add-ons test on default (Minefield) are marked as skipped, as the report
> says. You would have to use Namoroka or Shiretoko. But that's not really
> necessary. I have a much better and totally simple testcase for you.
> 
> The problem we have is, once a test is using a modal dialog which can also be a
> system dialog, we are failing to kill the browser when the global timeout kicks
> in:
> 
> var testNoKill = function() {
>   controller = mozmill.getBrowserController();
> 
>   // Open the "File Open" dialog
>   controller.keypress(null, "o", {accelKey: true});
> }
> 
> Here the log from the command line:
> 
> TEST-START |
> /Volumes/data/build/tools/mozmill/mozmill/test/restart/test_user_restart/test1.js
> | testRestartBeforeTimeout
> Timeout: bridge.execFunction("04059ae0-9f32-11df-8fbc-002332b130e8",
> bridge.registry["{13ba93ab-fcf6-2d41-a5c6-844d9b07cf18}"]["runTestFile"],
> ["/Volumes/data/build/tools/mozmill/mozmill/test/restart/test_user_restart/test1.js"])
> 
> TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed
> INFO Passed: 1
> INFO Failed: 0
> INFO Skipped: 0

I cannot reproduce this on the windows 2000 VM.  I run mozmill with the above test case.  I do get the bridge disconnect error above.  But the minefield gets correctly closed.  Additional runs have no problem starting up and produce identical results.  

I *do* get cleanup errors for the profile -- windows complains that the files are in use by another process (presumedly the Firefox process before its shutdown is complete).  But can't reproduce the failure to shutdown the browser error.
Alright, as my last comment showed I was running restart tests with mozmill-restart. I haven't executed "mozmill". Now with the latest statement from Jeff I have tried it again on all systems and I have seen that the normal mozmill command works quite fine. Only the mozmill-restart command is not able to kill the application. I'm able to reproduce this behavior with the above mentioned script across all platforms.
Summary: Mozmill sometimes can't shutdown browser (Timeout: bridge.execFunction) → mozmill-restart can't shutdown browser if a modal dialog is open (Timeout: bridge.execFunction)
this script will reproduce the bug.  should work on any os with bash
(In reply to comment #0)
> With the beta 1 of Mozmill 1.4.2 installed on our Ubuntu 9.10 machine in the QA
> lab the browser doesn't get closed during the restart tests, i.e. installing an
> extension. Sometimes we hang in the modal installation dialog. I can see the
> following error in the console:
> 
> Timeout: bridge.execFunction("40a9a744-9739-11df-8c69-000c291ab448",
> bridge.registry["{23062957-73e0-4730-b6f3-fe43481f45d4}"]["cleanQuit"], [])
> 
> Haven't ever seen this before. So I would assume this is a regression
> introduced with the global timeout.

This is not a regression.  Behaviour is identical to previous to the global timeout except that the python end gets closed.
This has to do with how we stop the runner and cleanup our harnesses.  Note that this logic hasn't at all changed for `mozmill-restart`.  It has changed slightly (in implementation, hopefully not in intent) for regular `mozmill`, but mostly due to the fact that I couldn't figure out what the intent was of cleanup for `mozmill-restart`, I left that part alone entirely except moving code around.

So our cleanup code is here

http://github.com/mozautomation/mozmill/blob/master/mozmill/mozmill/__init__.py#L456

and here

http://github.com/mozautomation/mozmill/blob/master/mozmill/mozmill/__init__.py#L497

Note that there is no end-of-test stop for MozmillRestart:

http://github.com/mozautomation/mozmill/blob/master/mozmill/mozmill/__init__.py#L659

Again, this is all, control-flow-wise, as it was originally.

These blocks of code are really messy.  close_bridge is set to false for MozmillRestart's call to stop_runner.  This is switched on to determine what the stop_runner function does.  In the case that it is true, additional cleanup is done.  Additionally, (fallback) cleanup is done in Mozmill.stop() .

None of this is robust.  Also, none of this is particularly obvious to me as far as why it is done this way.  I wanted to clean up this control flow a bit, but I couldn't figure out what was necessary as far as what should be called and the order it should be called.

This can probably be worked around for now with a hard runner kill in MozmillRestart.stop()

def stop(self, hard=False):
  if hard:
    self.runner.kill()
    self.runner.profile.cleanup()

If it is desirable to do this one-off, we can.  However, the usual caveats:

 - this is not robust!  There will be edge cases that this does not suffice for! the solution for this of course is...

 - this entire control should be reconsidered.  We should make sure that we're doing what we think we're doing.  We should make this robust. Since it is vitally important to automation that things are cleaned up correctly, this is the critical point
(In reply to comment #20)
> This can probably be worked around for now with a hard runner kill in
> MozmillRestart.stop()
> 
> def stop(self, hard=False):
>   if hard:
>     self.runner.kill()
>     self.runner.profile.cleanup()
> 
> If it is desirable to do this one-off, we can.  However, the usual caveats:
> 
>  - this is not robust!  There will be edge cases that this does not suffice
> for! the solution for this of course is...
> 
>  - this entire control should be reconsidered.  We should make sure that we're
> doing what we think we're doing.  We should make this robust. Since it is
> vitally important to automation that things are cleaned up correctly, this is
> the critical point
Yeah, I hear Henrik's point that he wants to be able to ensure randomly started modal dialogs don't block tests.  However, randomly started modal dialogs have ALWAYS blocked mozmill tests.  This is not a new issue, and it's certainly not a regression.

If this hard kill does the trick with modal dialogs, then let's take that for 1.4.2.  But this is as far down the rabbit hole as we go on this issue.  You will always be able to find one more modal dialog that won't die, and until we completely refactor this code, we're just chasing ghosts.  We can't refactor the code for 1.4.2, so let's try this simple fix.  If it works in this case, and in the normal "shutdown case" when a test ends, then we take it.  But if you find another edge case here, I wouldn't be surprised, and I wouldn't hold 1.4.2 for it.
(In reply to comment #21)
> Yeah, I hear Henrik's point that he wants to be able to ensure randomly started
> modal dialogs don't block tests.  However, randomly started modal dialogs have
> ALWAYS blocked mozmill tests.  This is not a new issue, and it's certainly not
> a regression.

First thanks to Jeff for the investigation. It's true that a restart test-run never has been killed before, but the same was the case for restartless test-runs. While the latter one has been changed, so that we can kill the application, the former one hasn't. I got the impression that the global timeout patch will enable us to kill any of our supported test types. But as I can see now, that's not the case.
 
> If this hard kill does the trick with modal dialogs, then let's take that for
> 1.4.2.  But this is as far down the rabbit hole as we go on this issue.  You
> will always be able to find one more modal dialog that won't die, and until we
> completely refactor this code, we're just chasing ghosts.  We can't refactor
> the code for 1.4.2, so let's try this simple fix.  If it works in this case,
> and in the normal "shutdown case" when a test ends, then we take it.  But if
> you find another edge case here, I wouldn't be surprised, and I wouldn't hold
> 1.4.2 for it.

Agree. On the other hand that would definitely throw bad light on Mozmill regarding the buildbot integration. When we can't kill the application when a restart test hangs, releng will not be happy about. If the hard kill trick doesn't work should we target a follow-up release for those critical items only? I think we should add the dependency to bug 516984.
Keywords: regression
This works around this bug for the reproduction script above.  I have not tested it on user restart tests and, as the long comment in the patch says, it is not robust and i'm sure doesn't handle all cases.  But if we want a quick fix, this could be it.
Attachment #462856 - Flags: review?(ahalberstadt)
(In reply to comment #21)
> (In reply to comment #20)
> > This can probably be worked around for now with a hard runner kill in
> > MozmillRestart.stop()
> > 
> > def stop(self, hard=False):
> >   if hard:
> >     self.runner.kill()
> >     self.runner.profile.cleanup()
> > 
> > If it is desirable to do this one-off, we can.  However, the usual caveats:
> > 
> >  - this is not robust!  There will be edge cases that this does not suffice
> > for! the solution for this of course is...
> > 
> >  - this entire control should be reconsidered.  We should make sure that we're
> > doing what we think we're doing.  We should make this robust. Since it is
> > vitally important to automation that things are cleaned up correctly, this is
> > the critical point
> Yeah, I hear Henrik's point that he wants to be able to ensure randomly started
> modal dialogs don't block tests.  However, randomly started modal dialogs have
> ALWAYS blocked mozmill tests.  This is not a new issue, and it's certainly not
> a regression.

It is also not just modal dialogues.  Anything that triggers the global timeout with mozmill-restart will not clean up correctly.  Not surprising, since no one wrote any cleanup code.  As per comment 20, I wanted to clean up correctly when I was looking at it, but I couldn't figure out the logic that is actually there.

> If this hard kill does the trick with modal dialogs, then let's take that for
> 1.4.2.  But this is as far down the rabbit hole as we go on this issue.  You
> will always be able to find one more modal dialog that won't die, and until we
> completely refactor this code, we're just chasing ghosts.  We can't refactor
> the code for 1.4.2, so let's try this simple fix.  If it works in this case,
> and in the normal "shutdown case" when a test ends, then we take it.  But if
> you find another edge case here, I wouldn't be surprised, and I wouldn't hold
> 1.4.2 for it.

I've added another level of cleanup here. It is undoubtedly in the wrong place and will not handle all edge cases, but it does workaround this issue.  For 1.4.2 I would hate to do much more.  See attachment.
(In reply to comment #22)
> (In reply to comment #21)
> > Yeah, I hear Henrik's point that he wants to be able to ensure randomly started
> > modal dialogs don't block tests.  However, randomly started modal dialogs have
> > ALWAYS blocked mozmill tests.  This is not a new issue, and it's certainly not
> > a regression.
> 
> First thanks to Jeff for the investigation. It's true that a restart test-run
> never has been killed before, but the same was the case for restartless
> test-runs. While the latter one has been changed, so that we can kill the
> application, the former one hasn't. I got the impression that the global
> timeout patch will enable us to kill any of our supported test types. But as I
> can see now, that's not the case.

Firstly, there are two issues here.  Global timeout *does* kill mozmill for all of our cases.  That's all it does.  It does not add additional cleanup code.  Since the cleanup code didn't exist before, nothing additional (like shutting down the browser).  Our entire stop procedure is very brittle.  As per comment 20, since this is vital to automation, this is a big deal.  This deals with one obvious edge case.  But as the comment on the patch says, it is not robust and there are probably several edge cases where it won't work.  This fix is, plainly put, dead wrong.  If another edge case came up that this didn't fix, and it were to be fixed in the same way, it would involve adding yet another kill functionality at the end without understanding any of the preceding code.

> > If this hard kill does the trick with modal dialogs, then let's take that for
> > 1.4.2.  But this is as far down the rabbit hole as we go on this issue.  You
> > will always be able to find one more modal dialog that won't die, and until we
> > completely refactor this code, we're just chasing ghosts.  We can't refactor
> > the code for 1.4.2, so let's try this simple fix.  If it works in this case,
> > and in the normal "shutdown case" when a test ends, then we take it.  But if
> > you find another edge case here, I wouldn't be surprised, and I wouldn't hold
> > 1.4.2 for it.
> 
> Agree. On the other hand that would definitely throw bad light on Mozmill
> regarding the buildbot integration. When we can't kill the application when a
> restart test hangs, releng will not be happy about. If the hard kill trick
> doesn't work should we target a follow-up release for those critical items
> only? I think we should add the dependency to bug 516984.

The tests cherry-picked for mozilla-central should be robust and thoroughly tested and staged as documented on bug 516984.  Ultimately, when global timeout is triggered, it usually means that something is wrong with a test.  While we are still finding harness failures, it will always be possible to write a test that does something bad.  Again, the correct solution is to make our closing procedure as robust as possible, which means providing a single point of failure.  This patch does not address that. Mozmill has a number of bugs and serious architecture issues.  Hopefully these will be addressed by mozmill 2.0.  But this shouldn't be a criteria for using the software in production;  if it was, MS Office wouldn't be used by any business.  The question is, for the set of tests and buildbot steps provided, will Mozmill hang?  The answer in practicality is probably "No if we make sure we pick a good set of tests".
Attachment #462807 - Attachment mime type: application/x-sh → text/plain
see also the bigger issue: https://bugzilla.mozilla.org/show_bug.cgi?id=584464
The patch works fine on all platforms, except that we kill the whole restart test-run. As talked with Jeff on IRC I have filed that as bug 584470 which is definitely post Mozmill-1.4.2 stuff.
Comment on attachment 462856 [details] [diff] [review]
workaround for killing browser

Tested with the user-restart tests as well as manually closing other random tests and it appears to work fine.

It also fixes the case where the user doesn't specify a user restart and then restarts the browser.  It used to just throw the 'Application Disconnected' error and then the instance of the browser would pop back up and be left running.  Now the restarted instance gets closed.

r+, r=ahalberstadt
Attachment #462856 - Flags: review?(ahalberstadt) → review+
pushed to master

http://github.com/mozautomation/mozmill/commit/2852266041a448fdacdefb93cb1c875f623a3232

I'll push to 1.4.2 as soon as I can figure out how to do that
pushed to 1.4.2

http://github.com/mozautomation/mozmill/commit/1decfe71d8f7a23be7dd19562e55e0179d707d9b
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Works as expected on OS X and Windows. Lets call it verified fixed.
Status: RESOLVED → VERIFIED
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: