Closed Bug 834635 Opened 11 years ago Closed 11 years ago

[Mac] mozmill-restart fails to force kill the application in the middle of a testrun if application is busted

Categories

(Testing Graveyard :: Mozmill, defect)

All
macOS
defect
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whimboo, Assigned: whimboo)

References

Details

(Whiteboard: [mozmill-1.5.20+][mozmill-1.5.20-fixed])

Attachments

(1 file, 1 obsolete file)

Because of bug 834432 we had busted l10n builds. Those caused a yellow-screen-of-death window to appear instead of the browser window. In those cases we timeout but Mozmill is not able to kill those applications on OS X. Linux and Windows platforms are working fine.

Steps to reproduce:

1. Download a broken l10n nightly build which raises a yellow-screen-of-death
window from here:
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/01/2013-01-24-05-41-58-mozilla-central-l10n/

2. Clone our mozmill tests repository 

3. Execute mozmill like: mozmill -b
/Volumes/Nightly/FirefoxNightly.app/Contents/MacOS/firefox -t
tests/functional/testToolbar/

I'm going to investigate that now. It's clearly a blocker for us!
Blocks: 813170
Each time we want to restart Firefox in a restart test I can see the following abort message in the console:

###!!! ABORT: file ../../../../ipc/chromium/src/base/thread_local_posix.cc, line 33

http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/thread_local_posix.cc#33

30 // static
31 void ThreadLocalPlatform::SetValueInSlot(SlotType& slot, void* value) {
32   int error = pthread_setspecific(slot, value);
33   CHECK(error == 0);
34 }

Not sure how much that affects us but I would leave this beside for now.
So this only applies to mozmill-restart. It works fine for the mozmill command.
Summary: [Mac] Mozmill fails to force kill the application if it's busted → [Mac] mozmill-restart fails to force kill the application if it's busted
For restart tests it only happens in the middle of a testrun. It's enough to even have two skipped tests only. The instance of the application used for test1.js will still remain.
Summary: [Mac] mozmill-restart fails to force kill the application if it's busted → [Mac] mozmill-restart fails to force kill the application in the middle of a testrun if application is busted
So the problem is located here:
https://github.com/mozilla/mozmill/blob/hotfix-1.5/mozrunner/mozrunner/killableprocess.py#L312

As given by the timeout we should run the loop by default 30s, but by multiplying the count with two, we exit the loop already after 15s. This seems to be plainly wrong to me. 

Further it will fail later here:

https://github.com/mozilla/mozmill/blob/hotfix-1.5/mozmill/mozmill/__init__.py#L495

Given that we are back too early we do not properly stop the runner, and the application is not getting closed.

I will have to check how Mozmill 2 handles that code and if I can simply c&p it.
Attached patch Patch v1 (obsolete) — Splinter Review
Least invasive change to Mozrunner on the hotfix-1.5 branch. Jeff, if you think we should do something better please propose but I'm worried that we could break something.

In current mozprocess we do not differentiate between linux and mac and always call the `os.waitpid()` method. But as mentioned above I'm not inclined to change that without causing other regressions.
Attachment #706385 - Flags: review?(jhammel)
Comment on attachment 706385 [details] [diff] [review]
Patch v1

+                        while (count <= timeout):

No need for ()s here.  Please remove.  Otherwise, lgtm. I vaguely recall looking at this code and thinking it was, at best, strange
(In reply to Henrik Skupin (:whimboo) from comment #5)
<snip/>
> In current mozprocess we do not differentiate between linux and mac and
> always call the `os.waitpid()` method. But as mentioned above I'm not
> inclined to change that without causing other regressions.

Is this a bug? Worth filing?
Attached patch patch v1.1Splinter Review
Patch updated. Please also set the review flag if it's fine. :)

(In reply to Jeff Hammel [:jhammel] from comment #7)
> > In current mozprocess we do not differentiate between linux and mac and
> > always call the `os.waitpid()` method. But as mentioned above I'm not
> > inclined to change that without causing other regressions.
> 
> Is this a bug? Worth filing?

It seems to work there and we do not hit this problem. But I will make sure to test this once 1.5.20 has been released.
Attachment #706385 - Attachment is obsolete: true
Attachment #706385 - Flags: review?(jhammel)
Attachment #706439 - Flags: review?(jhammel)
> It seems to work there and we do not hit this problem. But I will make sure to test this once 1.5.20 has been released.

Awesome! Thanks.
Comment on attachment 706439 [details] [diff] [review]
patch v1.1

Looks good!
Attachment #706439 - Flags: review?(jhammel) → review+
Blocks: 834893
Fix landed for hotfix-1.5 branch:
https://github.com/mozilla/mozmill/commit/5c270c73567a279444d5cc1d3e3d52cb76cba9aa

I will leave this bug open for further investigation of Mozmill 2.0.
Whiteboard: [mozmill-1.5.20+] → [mozmill-1.5.20+][mozmill-1.5.20-fixed]
I cannot see that Mozmill 2.0 is affected by this bug. But there is bug 794020 which can clearly be reproduced with those builds. Clearly another thing.

Closing bug given that it is fixed for Mozmill 1.5.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: