Closed Bug 965714 Opened 10 years ago Closed 10 years ago

[mozrunner] Invalid RunnerNotStartedError exception raised when wait() gets called and process already finished

Categories

(Testing :: Mozbase, defect)

All
macOS
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whimboo, Assigned: whimboo)

References

Details

(Keywords: regression, Whiteboard: [mozmill-2.0.4+])

Attachments

(1 file, 1 obsolete file)

I tested our restart architecture tests with the latest version of mozmill on hotfix-2.0 and it fails on both places in handle_disconnect where we wait for the application to exit with:

TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed
RESULTS | Passed: 5
RESULTS | Failed: 0
RESULTS | Skipped: 0
Traceback (most recent call last):
  File "/Volumes/data/code/mozmill/mozmill/mozmill/__init__.py", line 799, in run
    mozmill.run(tests, self.options.restart)
  File "/Volumes/data/code/mozmill/mozmill/mozmill/__init__.py", line 424, in run
    self.handle_disconnect()
  File "/Volumes/data/code/mozmill/mozmill/mozmill/__init__.py", line 499, in handle_disconnect
    self.runner.wait(timeout=self.jsbridge_timeout)
  File "/Volumes/data/envs/m2d/lib/python2.7/site-packages/mozrunner-5.32-py2.7.egg/mozrunner/base.py", line 113, in wait
    raise RunnerNotStartedError("Wait() called before process started")
RunnerNotStartedError: Wait() called before process started
This is actually a regression from bug 962495, because it directly polls the process_handler for the exit state now. With that we will leave mozrunner in a weird state because under those conditions the self.process_handler is not set back to None.

I think that we should stop resetting process_handler to none, but leave it existent as long as no other process has been started. Also we should only raise RunnerNotStartedError when no process has been started so far.

The only thing I'm not that sure about yet, is the initial None value of returncode. It could cause confusion if retrieved before a process has been started. Maybe we should make it a property and raise the same exception.
Blocks: 962495
Component: Mozmill → Mozbase
Keywords: regression
Summary: Mozmill can fail in waiting for the application on shutdown/restart if it quits too fast → [mozrunner] Invalid RunnerNotStartedError exception raised when wait() gets called and process already finished
Before I can create tests for such a scenario I have to write two extensions, which restart and shutdown the application after a couple of seconds. Reason is that this is a very specific regression which only happens when the application quits itself. None of our existent tests could catch this.
Actually I can circumvent it by calling runner.process_handler.kill() directly, so the Runner doesn't know about it.
Attached patch Patch v1 (obsolete) — Splinter Review
Ok, this patch makes kinda lot of changes for a better handling of the process handler and return code. I hope that's fine. I'm not sure if that causes any conflicts with projects outside of mozbase. At least all tests on mozbase are passing on OS X (linux and windows tests upcoming), and also Mozmill is fine. But if you want a try server run, let me know and I can get it started.
Attachment #8367885 - Flags: review?(ahalberstadt)
Comment on attachment 8367885 [details] [diff] [review]
Patch v1

Review of attachment 8367885 [details] [diff] [review]:
-----------------------------------------------------------------

I really like these changes. I have some comments, but nothing major.

::: mozrunner/mozrunner/base.py
@@ +52,5 @@
> +    def returncode(self):
> +        if isinstance(self.process_handler, subprocess.Popen):
> +            return self.process_handler.poll()
> +        elif isinstance(self.process_handler, ProcessHandler):
> +            return self.process_handler.proc.poll()

Didn't we add a poll method to mozprocess? Or not yet..

@@ +137,5 @@
>          """
> +        try:
> +            if not self.is_running():
> +                return
> +        except RunnerNotStartedError:

I would remove the try: except here. I think calling stop() before calling start() is a valid reason to raise an exception.

::: mozrunner/tests/test_interactive.py
@@ +44,5 @@
> +
> +    def test_wait_after_process_finished(self):
> +        """Wait after the process has been stopped should not raise an error"""
> +        self.runner.start(interactive=True)
> +        sleep(5)

Anyway we can make this any smaller? This is going to add about 1 minute per push to our infrastructure load, which may not be too bad in comparison to other tests, but it adds up. I'd almost be inclined to say the cost/benefit would swing in favour of not having this test at all.
Attachment #8367885 - Flags: review?(ahalberstadt) → review+
(In reply to Andrew Halberstadt [:ahal] from comment #5)
> > +    def returncode(self):
> > +        if isinstance(self.process_handler, subprocess.Popen):
> > +            return self.process_handler.poll()
> > +        elif isinstance(self.process_handler, ProcessHandler):
> > +            return self.process_handler.proc.poll()
> 
> Didn't we add a poll method to mozprocess? Or not yet..

Not yet. I filed the bug yesterday.

> @@ +137,5 @@
> >          """
> > +        try:
> > +            if not self.is_running():
> > +                return
> > +        except RunnerNotStartedError:
> 
> I would remove the try: except here. I think calling stop() before calling
> start() is a valid reason to raise an exception.

We are calling stop() inside of start(). So at one position we have to add this check. Tell me what you would prefer. Same for clean-up, where we would also raise this error if runner has never been started.

> ::: mozrunner/tests/test_interactive.py
> @@ +44,5 @@
> > +
> > +    def test_wait_after_process_finished(self):
> > +        """Wait after the process has been stopped should not raise an error"""
> > +        self.runner.start(interactive=True)
> > +        sleep(5)
> 
> Anyway we can make this any smaller? This is going to add about 1 minute per
> push to our infrastructure load, which may not be too bad in comparison to
> other tests, but it adds up. I'd almost be inclined to say the cost/benefit
> would swing in favour of not having this test at all.

Why 1 minute per push? Across all the platforms? But yeah, I can make it smaller and even remove it completely.
(In reply to Henrik Skupin (:whimboo) from comment #6)
> > I would remove the try: except here. I think calling stop() before calling
> > start() is a valid reason to raise an exception.
> 
> We are calling stop() inside of start(). So at one position we have to add
> this check. Tell me what you would prefer. Same for clean-up, where we would
> also raise this error if runner has never been started.

Gna, is_running() should actually never raise an exception. Instead it should catch it and return False. That would fix it. Working on an update.
Attached patch Patch v2Splinter Review
Updated as mentioned in the previous comments.
Attachment #8367885 - Attachment is obsolete: true
Attachment #8367976 - Flags: review?(ahalberstadt)
Comment on attachment 8367976 [details] [diff] [review]
Patch v2

Review of attachment 8367976 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, it would be 5 seconds per build job of which there are over 20. Granted not all of them run on every push.. but still. Anyway, thanks for updating it. Lgtm! I wouldn't be surprised if this cause some sort of bustage on m-c though, so maybe a try run would be good.
Attachment #8367976 - Flags: review?(ahalberstadt) → review+
Had to push again because I missed to remove the mozinstall patch which introduced the pefile package:
https://tbpl.mozilla.org/?tree=Try&rev=0ebd5746ca59
Well, the package dependencies are getting more complex on m-c those days given that even mozversion is not mirrored yet. So no results from the unit tests, but all other types of tests are passing. I will do a full unit test run on Windows, and if all passes we are good to get this patch landed!
All green so I went ahead and got it merged:
https://github.com/mozilla/mozbase/commit/eafc6ef9164b63c7ff15a22e6e9240e646765192
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Blocks: 967058
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: