Closed Bug 1705385 Opened 4 years ago Closed 4 years ago

mozprocess ProcessHandler.kill can hang

Categories

(Testing :: Mozbase, defect)

Default
defect

Tracking

(firefox90 fixed)

RESOLVED FIXED
90 Branch
Tracking Status
firefox90 --- fixed

People

(Reporter: jgraham, Assigned: jgraham)

References

Details

Attachments

(8 files)

The TSAN tests are causing problems in wpt; the wpt harness tries to signal the Firefox process using mozrunner::Runner.stop(), which ends up calling kill() in ProcessHandlerMixin, which (unlike the subprocess::Popen case) calls wait, which (on unix) ends up calling os.waitpid which will never return if the process doesn't quit. So this precludes a pattern like "send SIGTERM and if that fails send SIGKILL".

Looking at the code in mozprocess, it's unclear what problem the _custom_wait function (on posix) is solving that isn't solved by the modern stdlib wait; afaict they both end up calling os.waitpid to do the actual waiting, but the stdlib version has support for a timeout. The comments in mozprocess suggest it's supposed to handle process trees better, but I can't see anything in the implementation to support this; possibly the stdlib implementation improved in the last 10 years.

I suggest we allow passing a timeout into mozrunner::stop and replacing the mozprocess _custom_wait function with stdlib wait to support that.

This is not obviously doing more than the stdlib code does these days,
and also doesn't support timeouts, so we're perhaps better just always
defering to the stdlib.

Assignee: nobody → james
Status: NEW → ASSIGNED

Unlike the stdlib Popen class, ProcessHandler.kill implictly waits for
the process to end. To avoid hangs in the case where the process
doesn't end, allow passing through a timeout to this function (on
posix only).

This is somewhat unfortunate as it means that ProcessHandler isn't
interchangable with Popen. But the only other option here appears to
be not doing the implicit wait which presumably consumers are relying on.

This allows us to handle the case where the runner doesn't stop
e.g. by using a more aggressive signal.

This allows the SIGTERM-then-SIGKILL logic to work as intended.

See Also: → 1521447, 1535695, 1536496, 1697308

This is not obviously doing more than the stdlib code does these days,
and also doesn't support timeouts, so we're perhaps better just always
defering to the stdlib.

Unlike the stdlib Popen class, ProcessHandler.kill implictly waits for
the process to end. To avoid hangs in the case where the process
doesn't end, allow passing through a timeout to this function (on
posix only).

This is somewhat unfortunate as it means that ProcessHandler isn't
interchangable with Popen. But the only other option here appears to
be not doing the implicit wait which presumably consumers are relying on.

This allows us to handle the case where the runner doesn't stop
e.g. by using a more aggressive signal.

This allows the SIGTERM-then-SIGKILL logic to work as intended.

Pushed by james@hoppipolla.co.uk: https://hg.mozilla.org/integration/autoland/rev/1e8842ace029 [mozprocess] Remove custom process wait code on posix, r=ahal https://hg.mozilla.org/integration/autoland/rev/4dc5ae176466 Allow passing a timeout to mozprocess.ProcessHandler.kill, r=ahal https://hg.mozilla.org/integration/autoland/rev/f230efb7a553 Allow passing a timeout to mozrunner.Runner.stop, r=ahal https://hg.mozilla.org/integration/autoland/rev/44a9d348b719 Pass a timeout when signalling firefox in wptrunner, r=ahal
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/28627 for changes under testing/web-platform/tests
Upstream PR merged by moz-wptsync-bot
See Also: → 1738298
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: