658509 - Failure in spawning the process: "Resource temporarily unavailable"

Reporter

Description

•

13 years ago

While running my all locales testrun for our functional tests, at one time the following failure appears during the restart tests:

   Process spawn failed with code 35!Sorry, cannot connect to jsbridge extension, port 24242

The teardown code in our automation script gets successfully run:

   *** Removing old installation at /var/folders/2M/2MxHJNLXHluoVFD5WLTavE+++TI/-Tmp-/tmpykN51B.binary/Firefox.app
   *** Removing repository '/var/folders/2M/2MxHJNLXHluoVFD5WLTavE+++TI/-Tmp-/tmpd9oQQE.mozmill-tests'


But after that any further attempt to mount a dmg images fails:

   *** Mounting firefox-5.0b2-build1.de.mac.dmg => /var/folders/2M/2MxHJNLXHluoVFD5WLTavE+++TI/-Tmp-/tmpONMcn1
   hdiutil: attach failed - Resource temporarily unavailable
   No section: 'App'

Right now, I'm not sure why that happens and it would need some more investigation. But that jsbridge issue only occurs on OS X, Windows and Linux do not show this problem. Further I'm also not sure why the mounting doesn't work anymore and if that has anything todo with the jsbridge issue.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 1

•

13 years ago

Well, not sure where this error message comes from but it has to be part of Mozmill. Jeff, do you know?

Component: Mozmill Automation → Mozmill Utilities

Product: Mozilla QA → Testing

QA Contact: mozmill-automation → mozmill-utilities

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 2

•

13 years ago

It really happens in-between a restart test:

TEST-START | /var/folders/2M/2MxHJNLXHluoVFD5WLTavE+++TI/-Tmp-/tmpd9oQQE.mozmill-tests/tests/functional/restartTests/testRestartChangeArchitecture/test2.js | testRestartedNormally
TEST-START | /var/folders/2M/2MxHJNLXHluoVFD5WLTavE+++TI/-Tmp-/tmpd9oQQE.mozmill-tests/tests/functional/restartTests/testRestartChangeArchitecture/test2.js | teardownTest
TEST-PASS | /var/folders/2M/2MxHJNLXHluoVFD5WLTavE+++TI/-Tmp-/tmpd9oQQE.mozmill-tests/tests/functional/restartTests/testRestartChangeArchitecture/test2.js | teardownTest
TEST-PASS | /var/folders/2M/2MxHJNLXHluoVFD5WLTavE+++TI/-Tmp-/tmpd9oQQE.mozmill-tests/tests/functional/restartTests/testRestartChangeArchitecture/test2.js | testRestartedNormally
Process spawn failed with code 35!Sorry, cannot connect to jsbridge extension, port 24242


It's not always reproducible and seems to require a couple of other locales to get run before.

Jeff Hammel

Comment 3

•

13 years ago

This is in jsbridge:

jsbridge/jsbridge/__init__.py :
70-        sleep(.25)
71-        ttl += .25
72-    if ttl == timeout:
73:        raise Exception("Sorry, cannot connect to jsbridge extension, port %s" % port)
74-    
75-    back_channel, bridge = create_network(host, port)
76-    sleep(.5)

In my experience, this occurs in one of two circumstances:
1. the port is not available
2. there is a failure in the harness code

We can probably safely eliminate 2. This leaves 1. or another error I have not yet seen. 

My blind guess is that, however this would happen, Firefox isn't being shutdown 100% before the restart is called.

The other possibility is that the time to create the network is exceeded. This is 60s by default.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 4

•

13 years ago

(In reply to comment #3)
> We can probably safely eliminate 2. This leaves 1. or another error I have
> not yet seen. 

Why can we eliminate 2? You know we are still using 1.5.x for our testing. And there should still be some erroneous code floating around. Something like Clint is working on for OS X.

> My blind guess is that, however this would happen, Firefox isn't being
> shutdown 100% before the restart is called.

What's the best way to get this debugged? Do you have some hints?

> The other possibility is that the time to create the network is exceeded.
> This is 60s by default.

I don't think that this is the case.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

13 years ago

Whiteboard: [mozmill-1.5.4?]

cmtalbert

Assignee

Comment 5

•

13 years ago

Can I get you to update the bug with the checkouts and the branches you are testing against as well as the command line you're using so we have some hope of reproducing it ourselves?

This error message "Process spawn failed" is not a Mozmill message, it's not in any part of our codebase.  That combined with the fact that you're unable to mount a dmg, makes me suspect memory/space/resource issues on the machine.  That's the only thing I can think of that would cause the issues you're seeing.  I also don't find any mention of an error code 35 for mac: http://support.apple.com/kb/ht2088, so I'm rather stumped.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 6

•

13 years ago

Interesting! That's XRE code which has been added for Firefox 4:
http://mxr.mozilla.org/mozilla2.0/source/toolkit/xre/MacLaunchHelper.mm

Not sure yet, what the error code 35 means but I'm certainly sure that the Mozmill test Dave has added on bug 631052 is causing this problem now. It changes the architecture and restarts Firefox in 64bit vs. 32bit.

Josh, can you easily say what this error code means? The documentation for Darwin isn't that helpful:

http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man2/posix_spawn.2.html

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 7

•

13 years ago

Clint, also why do we start the teardown function before the test function finishes? See the execution stack in comment 2. That looks somewhat wrong.

Dave Townsend [:mossop]

Comment 8

•

13 years ago

Looks like error 35 is either EWOULDBLOCK or EAGAIN which is the same resource is unavailable problem that you're getting trying to mount the dmg.

Jeff Hammel

Comment 9

•

13 years ago

(In reply to comment #4)
> (In reply to comment #3)
> > We can probably safely eliminate 2. This leaves 1. or another error I have
> > not yet seen. 
> 
> Why can we eliminate 2? You know we are still using 1.5.x for our testing.
> And there should still be some erroneous code floating around. Something
> like Clint is working on for OS X.

I am not aware of any harness failures that would effect this issue.  It is not impossible, hence the "probably", but I think it is less likely.  When I say harness failures, I ultimately mean genuine uncaught exceptions.

> > My blind guess is that, however this would happen, Firefox isn't being
> > shutdown 100% before the restart is called.
> 
> What's the best way to get this debugged? Do you have some hints?

I would put in a bunch of dump statements.  I realize that is vague and hand-wavy, but I'm not sure if there is much I can recommend methodologically.

> > The other possibility is that the time to create the network is exceeded.
> > This is 60s by default.
> 
> I don't think that this is the case.

I doubt so as well.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 10

•

13 years ago

Attached file example script for all locales functional testrun (obsolete) — Details

Simply save this file into the mozmill-automation root folder and run it with "all_locales_bft.py mac" on OS X. I should hopefully break after a dozen of tests. Keep in mind that such a testrun can take a really long time.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 11

•

13 years ago

(In reply to comment #8)
> Looks like error 35 is either EWOULDBLOCK or EAGAIN which is the same
> resource is unavailable problem that you're getting trying to mount the dmg.

Thanks Dave! Josh, do you have an idea when this could happen on OS X?

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 12

•

13 years ago

This failure always happens with the testRestartChangeArchitecture restart test. It's not bound to a specific locale but happens after a couple of those have been run for the functional tests. In the case of the link below it broke with the 'hi-IN' locale:

http://mozmill-archive.brasstacks.mozilla.com/#/functional/reports?branch=All&platform=All&from=2011-05-28&to=2011-05-29

You can see that because there are only results for the non-restart tests but not for the restart tests. Surprisingly a follow-up 'it' locale could be mounted and run for the non-restart tests. But it also failed finally in testRestartChangeArchitecture, right after test1.js.

I assume that there is a bad interaction between restarting Firefox with another architecture and JSBridge.

cmtalbert

Assignee

Comment 13

•

13 years ago

(In reply to comment #12)
> This failure always happens with the testRestartChangeArchitecture restart
> test. It's not bound to a specific locale but happens after a couple of
> those have been run for the functional tests. In the case of the link below
> it broke with the 'hi-IN' locale:
> 
> http://mozmill-archive.brasstacks.mozilla.com/#/functional/
> reports?branch=All&platform=All&from=2011-05-28&to=2011-05-29
> 
> You can see that because there are only results for the non-restart tests
> but not for the restart tests. Surprisingly a follow-up 'it' locale could be
> mounted and run for the non-restart tests. But it also failed finally in
> testRestartChangeArchitecture, right after test1.js.
> 
> I assume that there is a bad interaction between restarting Firefox with
> another architecture and JSBridge.

Awesome detective work Henrik. That's quite actionable.  I'll try running it on my mac and see if I can repro locally.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 14

•

13 years ago

If we don't have a solution for that issue when I'm back from vacation, I will probably have to disable those architecture tests for the time being.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 15

•

13 years ago

Clint, have you had a chance in the last two weeks to take a look at?

Jeff Hammel

Comment 16

•

13 years ago

To the best of my knowledge, Clint is out of town.  However, IIRC, he did go to quite a bit of time/effort to reproduce this locally and was unable to.  That's about all I know.

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Comment 17

•

13 years ago

Yeah, it came up at the last meeting. Clint said he couldn't reproduce it with the test run you gave him.

He indicated that having a very long multi-part restart test could help, and I discussed writing a quickie script that took part1 of some test and cloned it into part2-N. I dropped the ball on giving him that script, so he may have blocked on me. :|

I'll make sure he has it by the time he's back.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 18

•

13 years ago

I always hit this issue on qa-horus and qa-mozmill when running the update tests for all locales. As we have agreed on earlier, we should just use one of those boxes for investigation, whereby qa-mozmill is preferred here.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 19

•

13 years ago

This failure was machine related. I got even more troubles yesterday on qa-horus which forced me to restart this machine. Afterward I haven't seen this issue anymore. That means it has nothing to do with Mozmill => closing as invalid.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → INVALID

Whiteboard: [mozmill-1.5.4?]

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 20

•

13 years ago

Happened again today on qa-horus. I will reopen this bug for further investigations and to track the process. Therefore moving into Mozmill Automation.

Status: RESOLVED → REOPENED

Component: Mozmill Utilities → Mozmill Automation

Product: Testing → Mozilla QA

QA Contact: mozmill-utilities → mozmill-automation

Resolution: INVALID → ---

u415141

Comment 21

•

13 years ago

A Pivotal Tracker story has been created for this Bug: https://www.pivotaltracker.com/story/show/15490945

u415141

Comment 22

•

13 years ago

Henrik Skupin changed story state to started in Pivotal Tracker

u279076

Comment 23

•

13 years ago

Running into the same issue on qa-masterblaster now -- it happens during the all_locales_bft.py testrun reliably after the DA locale is tested.

This needs to be bumped in priority as it blocks us from reliably running l10n tests during releases.

Jeff Hammel

Updated

•

13 years ago

Attachment #534886 - Attachment mime type: text/x-python-script → text/plain

Jeff Hammel

Comment 24

•

13 years ago

Is there anything more in the logs than what is in the above comments?  I have not been able to reproduce this error, so any sort of hints and reproduction steps would be appreciated. I do not really have access to a Mac for testing, so this is hard to diagnose over the wire.

I would also recommend trying to port the tests (and whatever harness pieces) to Mozmill 2 and try that.  It handles restarting/stopping a bit more elegantly and should have better diagnostics for failure.

u279076

Comment 25

•

13 years ago

@Jeff, there is nothing more in the logs which I can point to. Henrik is on PTO until next week and might be able to provide more feedback at that time.

I was able to reproduce this on qa-mozmill as well last night but it happened much later (IT locale instead of DA locale).

I'm guessing that this is something garbage collection related (ie. something is not being cleaned up properly and builds up with each iteration until it can't take anymore) -- highly speculative, I know.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 26

•

13 years ago

(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #25)
> I was able to reproduce this on qa-mozmill as well last night but it
> happened much later (IT locale instead of DA locale).

I have never seen this on qa-mozmill before when OS X 10.5 was installed. Seems like after the upgrade 10.7 it has also been started on this box. I get the feeling that 10.6 has introduced something which breaks us here.

(In reply to Henrik Skupin (:whimboo) from comment #6)
> Interesting! That's XRE code which has been added for Firefox 4:
> http://mxr.mozilla.org/mozilla2.0/source/toolkit/xre/MacLaunchHelper.mm
> 
> Not sure yet, what the error code 35 means but I'm certainly sure that the
> Mozmill test Dave has added on bug 631052 is causing this problem now. It
> changes the architecture and restarts Firefox in 64bit vs. 32bit.
> 
> Josh, can you easily say what this error code means? The documentation for
> Darwin isn't that helpful:
> 
> http://developer.apple.com/library/mac/#documentation/Darwin/Reference/
> ManPages/man2/posix_spawn.2.html

Josh or Steven, has one of you an idea what this error message could mean?

Here the not verbose output from the hdiutil:

*** Mounting firefox-8.0b3-build1.zu.mac.dmg => /var/folders/yc/yc-yt4kPHweMwoUftOHQgE+++TQ/-Tmp-/tmpK9nEx4
2011-10-12 16:02:20.652 hdiutil[36754:2503] timed out waiting for helper registration
hdiutil: attach failed - Operation timed out

Assignee: nobody → hskupin

Status: REOPENED → ASSIGNED

Whiteboard: [see comment 6]

Steven Michaud [:smichaud] (Retired)

Comment 27

•

13 years ago

> hdiutil: attach failed - Operation timed out

We use hdiutil (on the Mac) to build *.dmg images when running "make package".

Here it seems to be failing in an attempt to mount a *.dmg image.

More than that I can't say.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 28

•

13 years ago

Thanks Steven. Given that the failure is located in the call to hdiutil I have now started an all locales test-run qa-masterblaster with the '-debug' option of hdiutil set. Lets see if I can reproduce and get more information.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 29

•

13 years ago

This is strange. Given the console output during the test-run I do not think that this issue is related to hdiutil, but really how we restart Firefox during the restart tests. I have mentioned that suspicion already earlier in comment 2, but that time we had a different error message.

With the latest test-run I have executed, the failure happens for the de build between the restart tests for default bookmarks and the master password. Please see the 'sh: fork: Resource temporarily unavailable' error right in between:

TEST-START | /var/folders/yc/yc-yt4kPHweMwoUftOHQgE+++TQ/-Tmp-/tmp6_pqWO.mozmill-tests/tests/functional/restartTests/testDefaultBookmarks/test1.js | teardownModule
TEST-PASS | /var/folders/yc/yc-yt4kPHweMwoUftOHQgE+++TQ/-Tmp-/tmp6_pqWO.mozmill-tests/tests/functional/restartTests/testDefaultBookmarks/test1.js | test1.js::teardownModule
sh: fork: Resource temporarily unavailable
TEST-START | /var/folders/yc/yc-yt4kPHweMwoUftOHQgE+++TQ/-Tmp-/tmp6_pqWO.mozmill-tests/tests/functional/restartTests/testMasterPassword/test1.js | setupModule
TEST-PASS | /var/folders/yc/yc-yt4kPHweMwoUftOHQgE+++TQ/-Tmp-/tmp6_pqWO.mozmill-tests/tests/functional/restartTests/testMasterPassword/test1.js | test1.js::setupModule

Jeff, when Firefox gets restarted (no user restart), is mozrunner involved in any way? Or is that completely done by Firefox? I wonder where this error messages comes from. Currently the output is not very helpful.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 30

•

13 years ago

Attached file hdiutil (debug) output — Details

Debug output during of a broken mount action with hdiutil.

Jeff Hammel

Comment 31

•

13 years ago

(In reply to Henrik Skupin (:whimboo) from comment #29)
> This is strange. Given the console output during the test-run I do not think
> that this issue is related to hdiutil, but really how we restart Firefox
> during the restart tests. I have mentioned that suspicion already earlier in
> comment 2, but that time we had a different error message.

Is there anything in particular about the way we restart Firefox that raises these suspicions? Or is it a gut feeling?  I'll say up front that MozMillRestart from 1.5 logic is questionable, though its actually better than the parent MozMill class.  In 2.0, both of these are better.
 
> With the latest test-run I have executed, the failure happens for the de
> build between the restart tests for default bookmarks and the master
> password. Please see the 'sh: fork: Resource temporarily unavailable' error
> right in between:
> 
> TEST-START |
> /var/folders/yc/yc-yt4kPHweMwoUftOHQgE+++TQ/-Tmp-/tmp6_pqWO.mozmill-tests/
> tests/functional/restartTests/testDefaultBookmarks/test1.js | teardownModule
> TEST-PASS |
> /var/folders/yc/yc-yt4kPHweMwoUftOHQgE+++TQ/-Tmp-/tmp6_pqWO.mozmill-tests/
> tests/functional/restartTests/testDefaultBookmarks/test1.js |
> test1.js::teardownModule
> sh: fork: Resource temporarily unavailable
> TEST-START |
> /var/folders/yc/yc-yt4kPHweMwoUftOHQgE+++TQ/-Tmp-/tmp6_pqWO.mozmill-tests/
> tests/functional/restartTests/testMasterPassword/test1.js | setupModule
> TEST-PASS |
> /var/folders/yc/yc-yt4kPHweMwoUftOHQgE+++TQ/-Tmp-/tmp6_pqWO.mozmill-tests/
> tests/functional/restartTests/testMasterPassword/test1.js |
> test1.js::setupModule

This looks like a system error message.  I don't know anything about Macs, but googling reveals, amongst other things, http://blog.ghostinthemachines.com/2010/01/19/mac-os-x-fork-resource-temporarily-unavailable/

> Jeff, when Firefox gets restarted (no user restart), is mozrunner involved
> in any way? Or is that completely done by Firefox? I wonder where this error
> messages comes from. Currently the output is not very helpful.

Other than user restart, mozrunner is always involved in process starting and stopping.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 32

•

13 years ago

Thanks Jeff for the pointer! I was able to get more progress on it. I now have added a 'ps -ef | grep firefox' call right before each test-run, which gets called from the wrapper script. As a result we get hundreds of Firefox processes listed after a couple of test-runs.

After the first run:
  503 77339 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77355 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77363 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77370 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77377 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77384 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77391 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77398 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77405 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77412 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77419 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77426 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77433 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77440 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77454 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)

After the second run:
  503 77339 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77355 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77363 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77370 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77377 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77384 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77391 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77398 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77405 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77412 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77419 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77426 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77433 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77440 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77454 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77461 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77491 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77509 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77517 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77524 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77531 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77538 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77545 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77552 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77559 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77566 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77574 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77581 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77588 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77595 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77609 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)
  503 77616 77316   0   0:00.00 ttys005    0:00.00 (firefox-bin)

That means for each build under test, we leave nearly all the time 16 dead Firefox processes behind (15, 32, 48, 64, 80, ...). This increment stays constant, so I would now try to nail down if it is related to the restart tests, or our handling of the Firefox process in general.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 33

•

13 years ago

My assumption was correct. Removing restart tests from the test-run the number of left-over firefox processes drops. Now I will try to figure out if that is related to our automation scripts or if it also happens when calling mozmill via the command line directly.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 34

•

13 years ago

So this issue is absolutely related to our implementation of the automation scripts, and probably highly dependent on the MozmillWrapper class.

I'm in a limbo now how to proceed as best. One option would be to figure out what's wrong with our implementation for Mozmill 1.5.x and fix it, or directly make all the changes to use the API of Mozmill 2. The latter I would have to do in any way.

Option 1 seems to be kinda hard, because as Jeff always says the API on the hotfix-1.5 branch is awkward, and we will probably not get any other fix or enhancement for it. Everyone concentrates on Mozmill 2 respectively Mozmill 2.1 now.

I also don't want to spend my time on fixing code which we would get rid of somewhat soon. Until we have made a decision, I will check the implementation of the runtest.py for Thunderbird tests. That would give us hopefully a fix in the short term.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

13 years ago

Summary: Failure in spawning the process (code 35) results in no jsbridge connection → Failure in spawning the process: "Resource temporarily unavailable"

Whiteboard: [see comment 6]

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 35

•

13 years ago

Ok, so I have re-arranged the scripts and got it partly working but not by using mozmill directly. A way would be to subclass CLI and RestartCLI.

> class FirefoxCLI(mozmill.CLI):
>     parser_options = copy.copy(mozmill.CLI.parser_options)
>     parser_options[("--entities",)] = dict(dest="entities")
>     parser_options.pop(("-b", "--binary"))
> 
>     def __init__(self, *args, **kwargs):
>         mozmill.CLI.__init__(self, *args, **kwargs)
> 
>         self.options.debug = True
>         self.options.binary = self.args[0]
> 
> 
> FirefoxCLI().run()
> FirefoxCLI().run()
> FirefoxCLI().run()
> 
> cmdArgs = ["ps", "-ef"]
> subprocess.call(cmdArgs)

BUT, the code snippet above presents the same results:

 501 31587   298   0  6:13PM ttys000    0:00.60 python ./_test.py /Applications/Firefox/Nightly.app/ -t ../mozmill-tests/nightly/tests/functional/
  501 31590 31587   0  6:13PM ttys000    0:00.00 (firefox-bin)
  501 31601 31587   0  6:13PM ttys000    0:00.00 (firefox-bin)
  501 31610 31587   0  6:13PM ttys000    0:00.00 (firefox-bin)
 
There are still dead Firefox instances existent while being in the shell. So I can't believe anymore that the leak is in our own automation scripts but in Mozmill itself. Seems like the application processes are never really get shutdown.

I will move this over to mozbase now. Jeff or Clint, can one of please check the process handling at least on OS X? Something is kinda broken here and blocking us from running tests for all locales of Firefox. Thanks.

Assignee: hskupin → nobody

Component: Mozmill Automation → Mozbase

Product: Mozilla QA → Testing

QA Contact: mozmill-automation → mozbase

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 36

•

13 years ago

The solution can be found here:
http://bytes.com/topic/python/answers/44787-defunct-processes-subprocess-popen

We are simply missing a final os.wait() call.

Assignee: nobody → hskupin

David Burns :automatedtester

Comment 37

•

13 years ago

If kill() is called either 

    os.kill(process.pid, signal.SIGTERM)

or 

    process.kill 

we should call wait() afterwards. Not calling wait() can leave a number of Zombie processes that the OS can be really slow to clean up.

http://code.google.com/p/selenium/source/browse/trunk/py/selenium/webdriver/firefox/firefox_binary.py?spec=svn14564&r=14564#49 is my WebDriver code for killing the browser when we don't need it anymore. This works cross platform.

Jeff Hammel

Comment 38

•

13 years ago

Is it known which version of which piece of software is being talked about here?

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 39

•

13 years ago

(In reply to Jeff Hammel [:jhammel] from comment #38)
> Is it known which version of which piece of software is being talked about
> here?

Jeff, please read my comment 35. This is mozrunner/killableprocess land.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 40

•

13 years ago

Unassinging myself because it looks like to be a simple fix which can be done by Jeff or Clint, who are both more familiar with the mozrunner/killableprocess code.

Assignee: hskupin → nobody

Status: ASSIGNED → NEW

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 41

•

13 years ago

Clint, can you please give us any update on this bug? When your team will have time to fix this issue, as mentioned in the last Mozmill meeting? Thanks.

cmtalbert

Assignee

Comment 42

•

13 years ago

Attached patch Mozmill 1.5 Fix for mac os x — Details — Splinter Review

Ok, here is a fix for mozmill 1.5, calling os.wait() doesn't work on linux no matter what I do. I keep getting a traceback with "error 10: no child process found".  But since the issue only appears on mac os x, I'm somewhat confident about this fix to only call the os.wait if we are on mac os x.

Henrik, do you have any means to test this?  No one on my team has ever been able to consistently reproduce this bug, but none of us are heavy mac os users, so that is likely why.

Assignee: nobody → ctalbert

Status: NEW → ASSIGNED

Attachment #575557 - Flags: review?(jhammel)

Attachment #575557 - Flags: feedback?(hskupin)

Jeff Hammel

Comment 43

•

13 years ago

Comment on attachment 575557 [details] [diff] [review]
Mozmill 1.5 Fix for mac os x

this works for me.  i have not tested though

Attachment #575557 - Flags: review?(jhammel) → review+

cmtalbert

Assignee

Comment 44

•

13 years ago

Attached patch Patch for hotfix-2.0 (obsolete) — Details — Splinter Review

Jeff, I think I'm hitting an issue you've seen.  With a stock checkout of hotfix-2.0, I'm getting an exception at the end of my test:
TEST-START | mutt/mutt/tests/js/test_nothing.js | setupModule
TEST-START | mutt/mutt/tests/js/test_nothing.js | test_nothing
INFO | Step Pass: {"function": "Controller.open()"}
INFO | Step Pass: {"function": "controller.waitFor()"}
INFO | Step Pass: {"function": "controller.waitForPageLoad()"}
TEST-PASS | mutt/mutt/tests/js/test_nothing.js | test_nothing
INFO | Timeout: bridge.execFunction("85439636-123e-11e1-aed1-000c29f1d61d", bridge.registry["{b421fada-0211-4e8a-9ae9-dfdff206e79b}"]["cleanQuit"], [])
INFO | 
INFO | Passed: 1
INFO | Failed: 0
INFO | Skipped: 0

Note that timeout.  That happens whether I have this patch active or not, whether I put os.wait() directly after our os.kill statements or not.  So, right now I'm being cautious and just making exactly the same patch that I made for hotfix-1.5 for the 2.0 codebase.  It doesn't seem to change the above error in any way on linux (which I wouldn't expect it to, being gated with the if mozinfo.isMac property).  So, I guess it's safe.  Is there a bug on this above error?

Attachment #575589 - Flags: review?(jhammel)

Jeff Hammel

Comment 45

•

13 years ago

I have this as bug https://bugzilla.mozilla.org/show_bug.cgi?id=696468, though that includes all the weirdiosity that i've seen on a particular machine and may conflate issues.  FWIW, that is the message I get, with every shutdown

Jeff Hammel

Comment 46

•

13 years ago

Comment on attachment 575589 [details] [diff] [review]
Patch for hotfix-2.0

Looks good to me.  The other issue scares the hell out of me though

Attachment #575589 - Flags: review?(jhammel) → review+

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 47

•

13 years ago

Attached file testcase — Details

Clint, this is a testcase we probably could easily transform into a Mutt test. Simply execute it with -b and -t. Before the script exists it calls 'ps -ef' via os.system(). As you can see with Mozmill 1.5.6 there is one process listed:

501 10642 10639   0 12:53PM ttys001    0:00.00 (firefox-bin)

When you uncomment the os.wait() call in the test, it will solve the problem. I think it's a good candidate for a test.

I will now try your patch and check if it works.

Attachment #534886 - Attachment is obsolete: true

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 48

•

13 years ago

Comment on attachment 575557 [details] [diff] [review]
Mozmill 1.5 Fix for mac os x

Looks good and solves the problem on OS X. I have also tested on Linux, which I haven't done before and I'm also not able to see this issue there. So it's really OS X only.

Would be great if we could immediately push a new version of mozmill/mozrunner to pypi

Attachment #575557 - Flags: feedback?(hskupin) → feedback+

Jeff Hammel

Comment 49

•

13 years ago

So we want mozrunner/mozmill bumped for the hotfix-1.5 branch and released to pypi?

Jeff Hammel

Comment 50

•

13 years ago

(In reply to Jeff Hammel [:jhammel] from comment #49)
> So we want mozrunner/mozmill bumped for the hotfix-1.5 branch and released
> to pypi?

If we do, please file a bug and assign to me

Jeff Hammel

Comment 51

•

13 years ago

pushed to hotfix-1.5: https://github.com/mozautomation/mozmill/commit/8adeb923b0a822eaff39900e86bc42c5a0f7ea3a

Jeff Hammel

Comment 52

•

13 years ago

see bug 704342 for the version bumping bug

Jeff Hammel

Comment 53

•

13 years ago

There is a bug on test_driver.js in hotfix-2.0. I need to figure this out before land there.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 54

•

13 years ago

Clint, it looks like that we haven't caught any possible path, when defunct processes are left behind. Today I have seen that when the jsbridge port is already used, we horribly end-up with a lot of such processes when trying to execute mozmill a second time in parallel. It will not happen so often but given that our process handling on 1.5 is broken, and we don't shutdown Firefox in cases of non-handled unexpected modal dialogs we could run into those issues. So we should take care of and test it for mozmill 2.0.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

13 years ago

Whiteboard: [mozmill-2.0?]

Jeff Hammel

Updated

•

13 years ago

Whiteboard: [mozmill-2.0?] → [mozmill-2.0+]

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

12 years ago

Whiteboard: [mozmill-2.0+] → [mozmill-2.0+][mozmill-1.5.7+]

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

12 years ago

Depends on: 718496

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 55

•

12 years ago

See bug 718496 for the regression introduced by the checkin into hotfix-1.5. Clint, what is the status for this bug for Mozmill 2.0?

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

12 years ago

Assignee: ctalbert → hskupin

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 56

•

12 years ago

Attached file Forward port to Mozbase master (v1) (obsolete) — Details

Pointer to Github pull-request

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

12 years ago

Attachment #622960 - Attachment description: Pointer to Github pull request: https://github.com/mozilla/mozbase/pull/12 → Forward port to Mozbase master (v1)

Attachment #622960 - Flags: review?(jhammel)

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

12 years ago

Attachment #575648 - Attachment mime type: text/x-python-script → text/plain

Jeff Hammel

Comment 57

•

12 years ago

Comment on attachment 622960 [details]
Forward port to Mozbase master (v1)

comment in pull request

Attachment #622960 - Flags: review?(jhammel) → review+

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 58

•

12 years ago

So I should have tested it with my testcase attached to the bug first. This problem doesn't occur anymore with the latest mozprocess version on master.

import mozrunner
import os
import sys

for i in range(2) :
  try:
    mozrunner.CLI(args=sys.argv[1:]).run()
  except Exception, e:
    print str(e)
  finally:
    pass

os.system("ps -ef | grep firefox")


Results in:

  501 48807 48799   0  9:03AM ttys007    0:00.00 sh -c ps -ef | grep firefox
  501 48809 48807   0  9:03AM ttys007    0:00.00 grep firefox

So I don't think we have to fix anything here for mozprofile.

Jeff and Clint, if you disagree feel free to reopen the bug.

Status: ASSIGNED → RESOLVED

Closed: 13 years ago → 12 years ago

Resolution: --- → FIXED

Whiteboard: [mozmill-2.0+][mozmill-1.5.7+] → [mozmill-1.5.7+]

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

12 years ago

Attachment #622960 - Attachment is obsolete: true

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

12 years ago

Attachment #575589 - Attachment is obsolete: true

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

12 years ago

Assignee: hskupin → ctalbert

example script for all locales functional testrun 13 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 3.15 KB, text/plain		Details
hdiutil (debug) output 13 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 4.74 KB, text/plain		Details
Mozmill 1.5 Fix for mac os x 13 years ago cmtalbert 655 bytes, patch	k0scist : review+ whimboo : feedback+	Details \| Diff \| Splinter Review
Patch for hotfix-2.0 13 years ago cmtalbert 1018 bytes, patch	k0scist : review+	Details \| Diff \| Splinter Review
testcase 13 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 274 bytes, text/plain		Details
Forward port to Mozbase master (v1) 12 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 346 bytes, text/html	k0scist : review+	Details