Open Bug 1421289 Opened 7 years ago Updated 2 years ago

[mozprocess] mozprocess ProcessReader waits for stdout/stderr threads after timeout

Categories

(Testing :: Mozbase, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: gbrown, Unassigned)

References

(Depends on 1 open bug)

Details

Attachments

(1 file)

I encountered this while investigating current mochitest hangs. Consider the case where an output timeout has triggered a break from the loop at

https://dxr.mozilla.org/mozilla-central/rev/cad9c9573579698c223b4b6cb53ca723cd930ad2/testing/mozbase/mozprocess/mozprocess/processhandler.py#980

Now the ProcessReader continues to wait for the stdout/stderr reader threads to end:

https://dxr.mozilla.org/mozilla-central/rev/cad9c9573579698c223b4b6cb53ca723cd930ad2/testing/mozbase/mozprocess/mozprocess/processhandler.py#996-999

I think those threads normally end because the test harness output timeout handler has killed the process (typically via mozcrash or similar killAndGetStack()), causing readline() to return None:

https://dxr.mozilla.org/mozilla-central/rev/cad9c9573579698c223b4b6cb53ca723cd930ad2/testing/mozbase/mozprocess/mozprocess/processhandler.py#933-935

But what if the process didn't die (bug 1404190?), and the stream remains open? The ProcessReader knows it has timed out, has drained its queue and isn't interested in any more output, but it still waits for those threads...possibly forever.
See Also: → 1421295
I don't know how to proceed here. The existing code is efficient and straight forward, blocking on readline(). It looks to be very difficult to interrupt a thread in python...and probably dangerous. Other than killing the process (since we are having trouble with that), is there a way to break out of readline()? Close the stream??

asyncio could be helpful...but it is only available in python 3, afaik.
(In reply to Geoff Brown [:gbrown] from comment #1)
> Close the stream??

I tried closing stdout/stderr from another thread (threading.Timer), but it didn't work. It just threw:

IOError: close() called during concurrent operation on the same file object.

and left the reader thread blocked on readline().
Bug https://bugzilla.mozilla.org/show_bug.cgi?id=1391545 which is recommended for disable is depending on this bug.

:whimboo, :gbrown do you guys have any updates regarding dis bug?
Flags: needinfo?(hskupin)
Flags: needinfo?(gbrown)
I'm not working on this bug and don't have any new ideas for unblocking it.

This bug could theoretically cause bug 1391545, but I don't know if it *actually* is a cause of 1391545.
Flags: needinfo?(gbrown)
I don't have updates neither. A better approach would be to get Marionette client logging implemented via bug 1394381, which should also be on the deps list of bug 1391545. I will add it.
Flags: needinfo?(hskupin)
The attached testcase demonstrates the behavior as reported on this bug, whereby it may be a bit different.

In this example the parent process is gone, and child processes which are running in a different session and process group, cause the output thread to wait forever.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Bug https://bugzilla.mozilla.org/show_bug.cgi?id=1407383 which is blocked by this one has 87 failures in the past 7 days.

:whimboo, could you please take a look?
Flags: needinfo?(hskupin)
As you can see on the newly added dependencies other bugs have to be fixed first. This is not an easy thing to do, also because it affects every piece in our automation and I don't want to regress something else.
Flags: needinfo?(hskupin)
Priority: -- → P1
I unassign myself until dependencies have been fixed.
Assignee: hskupin → nobody
Status: ASSIGNED → NEW
Priority: P1 → P3
Assignee: nobody → gbrown
Priority: P3 → P1

Looking at bug 1521447 and bug 1535695 recently, this issue came up again: On Windows 7, in marionette restart tests, sometimes a child process appears to hang after kill(). If we abandon subsequent process waits with timeouts, the process_handler ends up waiting for the ProcessReader, which is hung on readline().

See Also: → 1521447, 1535695

(In reply to Geoff Brown [:gbrown] from comment #2)

(In reply to Geoff Brown [:gbrown] from comment #1)

Close the stream??

I tried closing stdout/stderr from another thread (threading.Timer), but it
didn't work. It just threw:

IOError: close() called during concurrent operation on the same file object.

and left the reader thread blocked on readline().

I realized I need to use a lock to guard the stdout/stderr stream access; then another thread can close the streams, and readline() is unblocked.

See Also: → 1532630

(In reply to Geoff Brown [:gbrown] from comment #11)

I realized I need to use a lock to guard the stdout/stderr stream access; then another thread can close the streams, and readline() is unblocked.

Lock allows another thread to safely access the stream. BUT, the stream readline needs to be guarded by the Lock too, of course, so in the case where readline blocks, the other thread cannot acquire the lock to close the stream -- we just achieve another deadlock.

Assignee: gbrown → nobody
Priority: P1 → P3
Severity: normal → S3

The severity field for this bug is relatively low, S3. However, the bug has 5 See Also bugs.
:gbrown, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(gbrown)

I think the See Also bugs reflect the issue's complexity. S3 seems appropriate.

Flags: needinfo?(gbrown)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: