Closed Bug 776931 Opened 12 years ago Closed 12 years ago

Mozrunner has two threads in proc_handler.waitForFinish()

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: ahal, Assigned: ahal)

References

Details

Attachments

(3 files, 3 obsolete files)

Ideal patch that would break everything 12 years ago Andrew Halberstadt [:ahal] 5.30 KB, patch		Details \| Diff \| Splinter Review
Hackier patch that would break nothing 12 years ago Andrew Halberstadt [:ahal] 2.09 KB, patch	k0scist : review+	Details \| Diff \| Splinter Review
Patch 3 12 years ago Andrew Halberstadt [:ahal] 7.75 KB, patch	cmtalbert : feedback+	Details \| Diff \| Splinter Review
Patch 4.0 - Don't worry about backwards compatibility 12 years ago Andrew Halberstadt [:ahal] 8.44 KB, patch	k0scist : review+	Details \| Diff \| Splinter Review
Patch 4.1 - Incremental 12 years ago Andrew Halberstadt [:ahal] 2.96 KB, patch	k0scist : review+	Details \| Diff \| Splinter Review
Fixes mozprocess tests 12 years ago Andrew Halberstadt [:ahal] 11.69 KB, patch	k0scist : review+	Details \| Diff \| Splinter Review

Andrew Halberstadt [:ahal]

Assignee

Description

•

12 years ago

With the patch to bug 661343, a thread is implicitly spawned inside runner.start() to handle output.

This is fine, unless an application (like peptest) also calls self.runner.wait(). Now there are two threads both processing output. This in itself also isn't too bad. The problem occurs after output is finished and proc_handler.proc.wait() is called. In this method a "FINISHED" message is put on a queue, and will be consumed by the first thread to enter wait(). The next thread will timeout and a Queue.Empty exception will get raised.

See:
https://github.com/mozilla/mozbase/blob/eea3f2462c8fb8ec6056af940d2936ec5885d01e/mozrunner/mozrunner/runner.py#L183
https://github.com/mozilla/mozbase/blob/eea3f2462c8fb8ec6056af940d2936ec5885d01e/mozprocess/mozprocess/processhandler.py#L409

This is currently responsible for peptest bustage on windows:
https://tbpl.mozilla.org/?noignore=1

Not sure why it isn't breaking Linux (maybe a timing thing?)

Andrew Halberstadt [:ahal]

Assignee

Updated

•

12 years ago

Assignee: nobody → ahalberstadt

Status: NEW → ASSIGNED

Andrew Halberstadt [:ahal]

Assignee

Comment 1

•

12 years ago

Attached patch Ideal patch that would break everything (obsolete) — Details — Splinter Review

I think the fix from bug 661343 is actually the right approach. Mozrunner should always read output, whether or not the application calls runner.wait().

I think the ideal fix is to go a step further and completely separate reading output from waiting. These seem like two separate tasks, and I'm not sure why they are tightly coupled.

The above patch achieves this, but alas, it will break everything that currently uses mozrunner/mozprocess. Sigh.

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 2

•

12 years ago

Could this be the reason why we are timing out in Mozmill too on Windows and Linux? See bug 764643.

Andrew Halberstadt [:ahal]

Assignee

Comment 3

•

12 years ago

Attached patch Hackier patch that would break nothing (obsolete) — Details — Splinter Review

This patch would fix the problem. I made the default False since that way everything that uses mozrunner and calls runner.wait() won't need to be updated. But it means a change to mozmill will be needed.

I'm fine defaulting it to True, though other consumers of mozrunner may need to update their code in this case.

Anyway, maybe there's a compromise between this patch and the other one.

Attachment #645322 - Flags: review?(jhammel)

Andrew Halberstadt [:ahal]

Assignee

Updated

•

12 years ago

Blocks: 774817

Andrew Halberstadt [:ahal]

Assignee

Updated

•

12 years ago

Comment 4

•

12 years ago

(In reply to Henrik Skupin (:whimboo) from comment #2)
> Could this be the reason why we are timing out in Mozmill too on Windows and
> Linux? See bug 764643.

The stack trace you pasted for Windows looks like the same problem. This issue doesn't seem to happen for Peptest in Linux, but having two threads doing the same thing is pretty race condition prone, so who knows?

Henrik Skupin [:whimboo][⌚️UTC+1]

Updated

•

12 years ago

Blocks: 764643

Jeff Hammel

Comment 5

•

12 years ago

Comment on attachment 645322 [details] [diff] [review]
Hackier patch that would break nothing

This will work.  TBH, I think we're going to have a lot of bugs related to the way we process output in a thread.  Sounds easy, but... That said, I don't have a better solution right now.  We (=someone) should really take a look at the whole mozprocess and mozrunner architecture and see if what we're doing is at all sane.

Attachment #645322 - Flags: review?(jhammel) → review+

Andrew Halberstadt [:ahal]

Assignee

Comment 6

•

12 years ago

IMO, the first patch I posted is the way things should be done. I just don't know how to land it without breaking everything.

This patch also won't help out mozmill at all. If mozmill sets read_output=True, mozmill will have the problem described here and if we leave it to False, mozmill will have the problem described in bug 661343.

The reason is that mozmill uses runner.wait() as it appears after applying the first patch (see https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/__init__.py#L451). Peptest on the other hand calls runner.wait() immediately after calling runner.start() so we see output immediately.

I'm going to see if I can come up with a better patch.

Andrew Halberstadt [:ahal]

Assignee

Comment 7

•

12 years ago

Attached patch Patch 3 (obsolete) — Details — Splinter Review

This patch should support both methods of using runner.start/wait (e.g peptest usage vs mozmill usage) as well as fix both the issue described here and the issue described in bug 661343 as well as maintain complete backwards compatibility.

It is a bit bulkier and therefore scarier than the other patch, so I wouldn't check this in until we do thorough testing on several platforms and test harnesses.

Let me know what you think.

Attachment #645402 - Flags: feedback?(jhammel)

Attachment #645402 - Flags: feedback?(ctalbert)

cmtalbert

Comment 8

•

12 years ago

Comment on attachment 645402 [details] [diff] [review]
Patch 3

I'm fine with breaking things upstream (and thus requiring changes to them) if this is the right way to solve this problem. This patch looks pretty seamless and I really like it. If we take this and it requires changes in the other harnesses (peptest, mozmill etc) then we should see those diffs attached here as well and we should bump versions at that time. We can still control the entire ecosystem here so fixing things correctly is more important to me than ensuring we don't change APIs.

I like the approach.

Attachment #645402 - Flags: feedback?(ctalbert) → feedback+

Andrew Halberstadt [:ahal]

Assignee

Comment 9

•

12 years ago

Attached patch Patch 4.0 - Don't worry about backwards compatibility — Details — Splinter Review

This seems to fix both issues in both peptest and mozmill (though in mozmill I see a ton of "Window Hidden" messages, is this a known issue?)

I will do testing in windows and mac for various harnesses tomorrow. Note that this will break backwards compatibility as the mozprocess arguments timeout and outputTimeout have been moved from mozrunner.wait() to mozrunner.start()

Attachment #645888 - Flags: review?(jhammel)

Andrew Halberstadt [:ahal]

Assignee

Comment 10

•

12 years ago

It just occurred to me that maybe it would be better to call "processOutput" from within mozprocess.run() instead of mozrunner.start(). Not sure which way is better...

Jeff Hammel

Updated

•

12 years ago

Attachment #645402 - Flags: feedback?(jhammel)

Jeff Hammel

Comment 11

•

12 years ago

Comment on attachment 645888 [details] [diff] [review]
Patch 4.0 - Don't worry about backwards compatibility

Looks good.  If we are going to break backwards compatibility, I'd like to coordinate a release, as there may be other things worth taking, like bug 758233

Attachment #645888 - Flags: review?(jhammel) → review+

Andrew Halberstadt [:ahal]

Assignee

Comment 12

•

12 years ago

I'm fine with coordinating a release, as long as we figure out what we need and get it done really quickly. This is currently blocking peptests from running on Windows (and may also fix the OSX issue).

Jeff Hammel

Comment 13

•

12 years ago

Go ahead and release while I figure out what to do with bug 758233 which will also break backwards compatability

Andrew Halberstadt [:ahal]

Assignee

Comment 14

•

12 years ago

Attached patch Patch 4.1 - Incremental — Details — Splinter Review

Sorry for the metric ton of reviews Jeff, but I realized that we should support this use case: https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/__init__.py#L451

Having a wait that you can't get out of seems like a bad idea. This will also make it more backwards compatible as it is only outputTimeout that will break.

Attachment #645314 - Attachment is obsolete: true

Attachment #645322 - Attachment is obsolete: true

Attachment #645402 - Attachment is obsolete: true

Attachment #646200 - Flags: review?(jhammel)

Jeff Hammel

Comment 15

•

12 years ago

Is this really only for x86_64 Windows 7 or should that bugzilla flag be changed?

Andrew Halberstadt [:ahal]

Assignee

Comment 16

•

12 years ago

At the very least it's for all Windows versions, but it seems to work on Linux.
https://tbpl.mozilla.org/?noignore=1

Peptest is also busted on OSX 10.7, but it's unclear if it is caused by the same bug or not. In https://bugzilla.mozilla.org/show_bug.cgi?id=774817#c9 you said you aren't sure how we are getting a None there... well having two threads in that function at the same time might be one possible way of getting None :), though that's just a theory.

Jeff Hammel

Comment 17

•

12 years ago

Comment on attachment 646200 [details] [diff] [review]
Patch 4.1 - Incremental

+            if not getattr(self.process_handler.proc, 'returncode', False):

Can we not check for if the return value of waitForFinish is None? (I honestly don't know).

Other than that, looks great!

Attachment #646200 - Flags: review?(jhammel) → review+

Andrew Halberstadt [:ahal]

Assignee

Comment 18

•

12 years ago

(In reply to Jeff Hammel [:jhammel] from comment #17)
> +            if not getattr(self.process_handler.proc, 'returncode', False):
> 
> Can we not check for if the return value of waitForFinish is None? (I
> honestly don't know).

I could make waitForFinish return True if there was no timeout and False if there was, but it seem a little less explicit. Also, runner.wait() always returns None, and Thread.join(timeout) also always returns None. They both require a second call to see if the process is still running.

Andrew Halberstadt [:ahal]

Assignee

Comment 19

•

12 years ago

master: https://github.com/mozilla/mozbase/commit/3a2acf4a0966ce34286cde843e67bbd78145c054

Needs to be synced to m-c again.

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Andrew Halberstadt [:ahal]

Assignee

Updated

•

12 years ago

Blocks: 778133

Andrew Halberstadt [:ahal]

Assignee

Comment 20

•

12 years ago

This breaks mozprocess1.py, but the test should be updated as things are done differently now: http://k0s.org:8010/builders/mozbase%20-%20linux/builds/89/steps/shell_5/logs/stdio

I'll have a patch soon.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Andrew Halberstadt [:ahal]

Assignee

Comment 21

•

12 years ago

Attached patch Fixes mozprocess tests — Details — Splinter Review

I updated the tests to work with this commit. I also found that there were a few problems with the mozprocess tests. 1) The detection algorithm for Unix was broken. 2) The assertion errors were very confusing, sometimes positive, sometimes negative. I standardized them.

Attachment #646620 - Flags: review?(jhammel)

Jeff Hammel

Comment 22

•

12 years ago

Comment on attachment 646620 [details] [diff] [review]
Fixes mozprocess tests

+        for line in output.split('\n'):

.splitlines() would be better

+            elif processName in line and not 'defunct' in line:

We should really not be doing this in this test.  I'm tempted just to disable both of these tests until we have a real solution, but this is fine for now.  I'll just file another bug to fix them

Attachment #646620 - Flags: review?(jhammel) → review+

Andrew Halberstadt [:ahal]

Assignee

Comment 23

•

12 years ago

https://github.com/mozilla/mozbase/commit/59a058e64b6fa911969864285cd9a2ec4cfd707d

Status: REOPENED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → FIXED

Jeff Hammel

Comment 24

•

12 years ago

> +            elif processName in line and not 'defunct' in line:
> 
> We should really not be doing this in this test.  I'm tempted just to
> disable both of these tests until we have a real solution, but this is fine
> for now.  I'll just file another bug to fix them

Ticketed: bug 778267

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 25

•

12 years ago

Andrew, when do you plan to push updated versions of the affected packages to PyPI?

Henrik Skupin [:whimboo][⌚️UTC+1]

Updated

•

12 years ago

Depends on: 780841

Henrik Skupin [:whimboo][⌚️UTC+1]

Updated

•

12 years ago

Depends on: 781473

Jonathan Griffin (:jgriffin)

Updated

•

12 years ago

Depends on: 783132

You need to log in before you can comment on or make changes to this bug.