proc_handler fix from bug 776931 breaks Mozmill shutdown logic

RESOLVED FIXED

Status

Testing
Mozbase
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: whimboo, Assigned: whimboo)

Tracking

({regression})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [mozmill-2.0+])

Attachments

(1 attachment)

(Assignee)

Description

6 years ago
The proc handler fix from bug 776931 breaks Mozmill shutdown logic. Especially the Mutt tests are now totally busted:

Traceback (most recent call last):
  File "/Volumes/data/code/mozmill/mutt/mutt/tests/python/usemozmill.py", line 11, in test_modules
    self.do_test(testpath, passes=1, fails=0, skips=0)
  File "/Volumes/data/code/mozmill/mutt/mutt/tests/python/usemozmill.py", line 25, in do_test
    results = m.run(tests)
  File "/Volumes/data/code/mozmill/mozmill/mozmill/__init__.py", line 413, in run
    self.stop_runner()
  File "/Volumes/data/code/mozmill/mozmill/mozmill/__init__.py", line 474, in stop_runner
    raise Exception('client process shutdown unsuccessful')
Exception: client process shutdown unsuccessful

I have seen this while working on bug 761564.

We should not update to the latest mozrunner code without having this issue fixed. Not sure yet if this is code we have to update in Mozmill or if it is mozbase related and we have a regression here in mozprocess.
DEBUG | mozmill.endRunner | true
TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed
RESULTS | Passed: 0
RESULTS | Failed: 1
RESULTS | Skipped: 0
ERROR | Traceback (most recent call last):
ERROR |   File "/home/ahal/git/mozmill/mozmill/mozmill/__init__.py", line 705, in run
ERROR |     mozmill.run(tests, self.options.restart)
ERROR |   File "/home/ahal/git/mozmill/mozmill/mozmill/__init__.py", line 403, in run
ERROR |     self.stop_runner()
ERROR |   File "/home/ahal/git/mozmill/mozmill/mozmill/__init__.py", line 453, in stop_runner
ERROR |     raise Exception('client process shutdown unsuccessful')
ERROR | Exception: client process shutdown unsuccessful



So a JSBridgeDisconnect error is getting raised here and mozrunner's is_running method simply returns self.process_handler != None.. https://github.com/mozilla/mozbase/blob/master/mozrunner/mozrunner/runner.py#L150

The odd thing is that report_disconnect() doesn't get called here, so the Application unexpectedly closed message is coming from somewhere else. It is likely that runner.stop() isn't being called due to the unexpected disconnect which means that is_running() will still return True even though the process has stopped. This is a bug that should be fixed in mozrunner, but the real problem for mozmill is to figure out why we are getting the Application unexpectedly closed in the first place.
(Assignee)

Comment 2

6 years ago
I will check what's going on here.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
(Assignee)

Comment 3

6 years ago
That's clearly a mozrunner issue here. The problematic line is:

if not getattr(self.process_handler.proc, 'returncode', False):

Whenever the process gets shutdown successful the returncode will be 0. That means we do not reset the process handler and mozrunner continues to return true for is_running().

I already have a patch which fixes the problem, but I will have to check how I can create a test for this case.
Assignee: hskupin → nobody
Component: Mozmill → Mozbase
(Assignee)

Updated

6 years ago
Assignee: nobody → hskupin
(Assignee)

Comment 4

6 years ago
Created attachment 650269 [details]
Pointer to Github pull request: https://github.com/mozilla/mozbase/pull/30

Pointer to Github pull-request
(Assignee)

Comment 5

6 years ago
Comment on attachment 650269 [details]
Pointer to Github pull request: https://github.com/mozilla/mozbase/pull/30

Simple fix as mentioned above. First we do not have to use getattr because waitForFinish will also return the returncode. Also if we not succeeded it will be not equal 0.
Attachment #650269 - Flags: review?(ahalberstadt)
(Assignee)

Comment 6

6 years ago
As discussed with Jeff on IRC we can't have a test for mozrunner right now. :/
(Assignee)

Updated

6 years ago
Blocks: 781105
Comment on attachment 650269 [details]
Pointer to Github pull request: https://github.com/mozilla/mozbase/pull/30

No, this won't work. process.waitForFinish will return 0 if everything goes ok, but also if it times out. It's set up like this because this how mozrunner works and also how python's threading library works.

You need to look at process.returncode to see if the process is running or not (it will be None if it is). I just checked and there is a convenience method in the subprocess module (http://docs.python.org/library/subprocess.html#subprocess.Popen.poll). You can change the if statement to:

if self.process_handler.proc.poll() == None:
    return
Attachment #650269 - Flags: review?(ahalberstadt) → review-
(Assignee)

Comment 8

6 years ago
(In reply to Andrew Halberstadt [:ahal] from comment #7)
> No, this won't work. process.waitForFinish will return 0 if everything goes
> ok, but also if it times out. It's set up like this because this how
> mozrunner works and also how python's threading library works.

Ah, I haven't watched closely enough. If the outThread times out we do return None but it's not a safe assumption to use. 

> if self.process_handler.proc.poll() == None:
>     return

And this also fixed the application disconnect issue I have seen on Windows with my previous patch. Thanks a lot!
(Assignee)

Comment 9

6 years ago
Comment on attachment 650269 [details]
Pointer to Github pull request: https://github.com/mozilla/mozbase/pull/30

Updated pull request based on the proposal.
Attachment #650269 - Flags: review- → review?(ahalberstadt)
(Assignee)

Comment 10

6 years ago
Andrew, I think the timeout we currently have is too short. While running a test I was starting a VM in parallel. This blocked the system for a couple of seconds and it caused the failure to appear again. Shouldn't we raise it to about 30 or 60s?
Comment on attachment 650269 [details]
Pointer to Github pull request: https://github.com/mozilla/mozbase/pull/30

Thanks for the patch! It doesn't matter to me about the timeout, you're talking about in mozmill right? As long as the default timeout in mozbase is None, feel free to adjust the mozmill one to whatever you want.
Attachment #650269 - Flags: review?(ahalberstadt) → review+
(Assignee)

Comment 12

6 years ago
(In reply to Andrew Halberstadt [:ahal] from comment #11)
> Thanks for the patch! It doesn't matter to me about the timeout, you're
> talking about in mozmill right? As long as the default timeout in mozbase is
> None, feel free to adjust the mozmill one to whatever you want.

Well, not sure about which timeout. We have the 1s for outThread and the timeout we pass in. I would assume that the code broke for the outThread timeout here when I started the VM in parallel.
(Assignee)

Comment 13

6 years ago
When I run the Python Mutt tests for Mozmill I still get a failure for the profile test:

ERROR: test_relative_path (testprofilepath.TestProfilePath)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Volumes/data/code/mozmill/mutt/mutt/tests/python/testprofilepath.py", line 38, in test_relative_path
    code = process.waitForFinish(timeout=120)
  File "/Volumes/data/code/mozbase/mozprocess/mozprocess/processhandler.py", line 713, in waitForFinish
    return self.proc.wait()
AttributeError: 'ProcessHandler' object has no attribute 'proc'

I will check what's wrong here.
(Assignee)

Comment 14

6 years ago
(In reply to Henrik Skupin (:whimboo) from comment #13)
> AttributeError: 'ProcessHandler' object has no attribute 'proc'

Actually this is a bug in the Mutt test, which doesn't call process.run(). I will fix this on another bug.

https://github.com/mozilla/mozbase/commit/011243a8bb212b1a75fcd083436c222471b362da
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
(In reply to Henrik Skupin (:whimboo) from comment #12)
> Well, not sure about which timeout. We have the 1s for outThread and the
> timeout we pass in. I would assume that the code broke for the outThread
> timeout here when I started the VM in parallel.

Are you talking about this? https://github.com/mozilla/mozbase/blob/master/mozprocess/mozprocess/processhandler.py#L708

That doesn't timeout after 1 second, all it's doing is waking up so that the main thread can receive keyboard interrupts and then going back to sleep in the while loop. That bit of code only times out if the while loop executes for "timeout" seconds which was passed in to the method.
(Assignee)

Comment 16

6 years ago
Ah. Thanks for the clarification. Makes sense. Whenever I see it again I will make sure to check that with Mozmill and patch it if necessary.
You need to log in before you can comment on or make changes to this bug.