Closed Bug 1438830 Opened 2 years ago Closed 1 year ago

[mozprocess] Track and kill detached child processes on Windows

Categories

(Testing :: Mozbase, defect, P1)

Version 3
defect

Tracking

(firefox63 fixed, firefox64 fixed)

RESOLVED FIXED
mozilla64
Tracking Status
firefox63 --- fixed
firefox64 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file)

Similar to bug 1176758 which implemented this for Posix we seem to need the same technique for Windows. Something got changed in the last couple of months and since then we are not able to correctly handle (find) the child process anymore.

It means we get output like the following:

11:50:19     INFO -  WARNING | IO Completion Port failed to signal process shutdown
11:50:19     INFO -  Parent process 7732 exited with children alive:
11:50:19     INFO -  PIDS: 1896, 4684, 668, 132
11:50:19     INFO -  Attempting to kill them, but no guarantee of success
Priority: -- → P3
I had a quick discussion with Aaron over on bug 1488872 about a possible idea, but as he mentioned that starter process will not help us.

So what we currently do is to use an I/O completion port to track process groups. See the following code of 

mozprocess:https://dxr.mozilla.org/mozilla-central/rev/c2e3be6a1dd352b969a45f0b85e87674e24ad284/testing/mozbase/mozprocess/mozprocess/processhandler.py#228

I have to admit that I don't know that much about all that yet. So bare with me in regards of details.

But what I can remember is that especially after an update of Firefox Marionette wasn't able to kill the process via mozprocess, because the process id has been changed. I haven't had the time yet to check since when this is actually happening. If it helps I could clearly try it. I could also provide a little Marionette test which could help to reproduce this.

Aaron, I would appreciate any kind of help, even only some tips in how to progress on this bug. Maybe you would have a bit of time to check the mozprocess code for job objects? It's old code and there is maybe a wrong usage, or something changed the last years with the new Windows versions. Thanks a lot.
Flags: needinfo?(aklotz)
You're already using a Job object, which is great! I think that the only part is missing is that you need to tweak one field.

This line:
https://dxr.mozilla.org/mozilla-central/rev/c2e3be6a1dd352b969a45f0b85e87674e24ad284/testing/mozbase/mozprocess/mozprocess/processhandler.py#324

needs a couple of tweaks.

1. I recommend that you do *not* set the winprocess.JOB_OBJECT_LIMIT_BREAKAWAY_OK flag unless < Windows 8.
2. I recommend that you always set winprocess.JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE. This will force all processes inside the job to terminate when you close the job handle.

I *think* that should be enough to give you what you want, at least on Win8+. You might still have some issues on Windows 7, because reasons.
Flags: needinfo?(aklotz)
(In reply to Aaron Klotz [:aklotz] from comment #2)
> 1. I recommend that you do *not* set the
> winprocess.JOB_OBJECT_LIMIT_BREAKAWAY_OK flag unless < Windows 8.

Actually, instead of that version check, you want to set that flag only when _can_nest_jobs() is True.
(In reply to Aaron Klotz [:aklotz] from comment #3)
> (In reply to Aaron Klotz [:aklotz] from comment #2)
> > 1. I recommend that you do *not* set the
> > winprocess.JOB_OBJECT_LIMIT_BREAKAWAY_OK flag unless < Windows 8.
> 
> Actually, instead of that version check, you want to set that flag only when
> _can_nest_jobs() is True.

omfg. False! You only want to set that flag when _can_nest_jobs() is False! Sorry!
Thanks a lot Aaron! The flag JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE is exactly what we need here to be able to kill Firefox **after** it was restarted. I tested it with the "Restart with extensions disabled" menu option. I will still have to check if that also works for updates, but I hope it will be the same.

Mind explaining me why `JOB_OBJECT_LIMIT_BREAKAWAY_OK` should only be set when `_can_nest_jobs()` is True? Why would it hurt otherwise?
Flags: needinfo?(aklotz)
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Priority: P3 → P1
(In reply to Henrik Skupin (:whimboo) from comment #5)
> Mind explaining me why `JOB_OBJECT_LIMIT_BREAKAWAY_OK` should only be set
> when `_can_nest_jobs()` is True? Why would it hurt otherwise?

As I said in comment 4, I actually got that backwards the first time ;-). You only want to set it when _can_nest_jobs() is false.

But the reason is this:

Since our sandbox also creates jobs, we preferably want to nest the sandbox's job inside the job that you're creating here. Specifying JOB_OBJECT_LIMIT_BREAKAWAY_OK would allow sandboxed processes within the job to break off and form their own, separate job. This is actually necessary for sandboxing to work on older versions of Windows without nested job support, but on newer versions, we don't want them breaking away.
Flags: needinfo?(aklotz)
(In reply to Aaron Klotz [:aklotz] from comment #6)
> As I said in comment 4, I actually got that backwards the first time ;-).
> You only want to set it when _can_nest_jobs() is false.

Ups, I missed the 2nd comment when replying. I will make sure to use it the right way.

> Since our sandbox also creates jobs, we preferably want to nest the
> sandbox's job inside the job that you're creating here. Specifying
> JOB_OBJECT_LIMIT_BREAKAWAY_OK would allow sandboxed processes within the job
> to break off and form their own, separate job. This is actually necessary
> for sandboxing to work on older versions of Windows without nested job
> support, but on newer versions, we don't want them breaking away.

Great explanation. I will add a limited description of that in mozprocess, and the full content in
the commit message if you don't mind.
Allow mozprocess to track and kill processes on Windows,
even when they got restarted.

Such processes are still part of the job object, but unless
the "JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE" flag is set, they
aren't getting closed.

Since our sandbox also creates jobs, we preferably want to
nest the sandbox's job inside the job as created by mozprocess.
Specifying "JOB_OBJECT_LIMIT_BREAKAWAY_OK" would allow
sandboxed processes within the job to break off and form their
own, separate job. This is actually necessary for sandboxing
to work on older versions of Windows without nested job support,
but on newer versions, we don't want them breaking away.
Attachment #9010531 - Flags: review?(aklotz)
So the patch is actually not complete yet. What we would need in terms of Marionette is that mozprocess would also know about the new and detached process id. In case of Posix we simply update it with the process id which Marionette gets via Firefox:

https://dxr.mozilla.org/mozilla-central/rev/a9e339b3e5d8dc3b108d3dc8b85f72b2235fafb2/testing/mozbase/mozprocess/mozprocess/processhandler.py#910

But how can we overwrite the process id (`self.pid`) on Windows? Aaron, is there a simple way of doing that? Or any alternative?
Comment on attachment 9010531 [details] [diff] [review]
[mozprocess] Track and kill detached processes on Windows

Review of attachment 9010531 [details] [diff] [review]:
-----------------------------------------------------------------

Lets try Geoff for the review, so that we can get this important patch landed.
Attachment #9010531 - Flags: review?(aklotz) → review?(gbrown)
(In reply to Henrik Skupin (:whimboo) from comment #10)
> So the patch is actually not complete yet. What we would need in terms of
> Marionette is that mozprocess would also know about the new and detached
> process id. In case of Posix we simply update it with the process id which
> Marionette gets via Firefox:
> 
> https://dxr.mozilla.org/mozilla-central/rev/
> a9e339b3e5d8dc3b108d3dc8b85f72b2235fafb2/testing/mozbase/mozprocess/
> mozprocess/processhandler.py#910
> 
> But how can we overwrite the process id (`self.pid`) on Windows? Aaron, is
> there a simple way of doing that? Or any alternative?

I moved this extra part to bug 1493796.
No longer blocks: 1420372
Comment on attachment 9010531 [details] [diff] [review]
[mozprocess] Track and kill detached processes on Windows

Review of attachment 9010531 [details] [diff] [review]:
-----------------------------------------------------------------

This seems reasonable and consistent with the discussion here.
Attachment #9010531 - Flags: review?(gbrown) → review+
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/afe0beb5f125
[mozprocess] Track and kill detached processes on Windows. r=gbrown
https://hg.mozilla.org/mozilla-central/rev/afe0beb5f125
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
While we can still uplift to beta, I would propose to do that for stability of this test-only patch. Thanks.
Whiteboard: [checkin-needed-beta]
You need to log in before you can comment on or make changes to this bug.