Closed
Bug 1438830
Opened 6 years ago
Closed 6 years ago
[mozprocess] Track and kill detached child processes on Windows
Categories
(Testing :: Mozbase, defect, P1)
Tracking
(firefox63 fixed, firefox64 fixed)
RESOLVED
FIXED
mozilla64
People
(Reporter: whimboo, Assigned: whimboo)
References
(Blocks 2 open bugs)
Details
Attachments
(1 file)
3.19 KB,
patch
|
gbrown
:
review+
|
Details | Diff | Splinter Review |
Similar to bug 1176758 which implemented this for Posix we seem to need the same technique for Windows. Something got changed in the last couple of months and since then we are not able to correctly handle (find) the child process anymore. It means we get output like the following: 11:50:19 INFO - WARNING | IO Completion Port failed to signal process shutdown 11:50:19 INFO - Parent process 7732 exited with children alive: 11:50:19 INFO - PIDS: 1896, 4684, 668, 132 11:50:19 INFO - Attempting to kill them, but no guarantee of success
Updated•6 years ago
|
Priority: -- → P3
Assignee | ||
Comment 1•6 years ago
|
||
I had a quick discussion with Aaron over on bug 1488872 about a possible idea, but as he mentioned that starter process will not help us. So what we currently do is to use an I/O completion port to track process groups. See the following code of mozprocess:https://dxr.mozilla.org/mozilla-central/rev/c2e3be6a1dd352b969a45f0b85e87674e24ad284/testing/mozbase/mozprocess/mozprocess/processhandler.py#228 I have to admit that I don't know that much about all that yet. So bare with me in regards of details. But what I can remember is that especially after an update of Firefox Marionette wasn't able to kill the process via mozprocess, because the process id has been changed. I haven't had the time yet to check since when this is actually happening. If it helps I could clearly try it. I could also provide a little Marionette test which could help to reproduce this. Aaron, I would appreciate any kind of help, even only some tips in how to progress on this bug. Maybe you would have a bit of time to check the mozprocess code for job objects? It's old code and there is maybe a wrong usage, or something changed the last years with the new Windows versions. Thanks a lot.
Flags: needinfo?(aklotz)
Comment 2•6 years ago
|
||
You're already using a Job object, which is great! I think that the only part is missing is that you need to tweak one field. This line: https://dxr.mozilla.org/mozilla-central/rev/c2e3be6a1dd352b969a45f0b85e87674e24ad284/testing/mozbase/mozprocess/mozprocess/processhandler.py#324 needs a couple of tweaks. 1. I recommend that you do *not* set the winprocess.JOB_OBJECT_LIMIT_BREAKAWAY_OK flag unless < Windows 8. 2. I recommend that you always set winprocess.JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE. This will force all processes inside the job to terminate when you close the job handle. I *think* that should be enough to give you what you want, at least on Win8+. You might still have some issues on Windows 7, because reasons.
Flags: needinfo?(aklotz)
Comment 3•6 years ago
|
||
(In reply to Aaron Klotz [:aklotz] from comment #2) > 1. I recommend that you do *not* set the > winprocess.JOB_OBJECT_LIMIT_BREAKAWAY_OK flag unless < Windows 8. Actually, instead of that version check, you want to set that flag only when _can_nest_jobs() is True.
Comment 4•6 years ago
|
||
(In reply to Aaron Klotz [:aklotz] from comment #3) > (In reply to Aaron Klotz [:aklotz] from comment #2) > > 1. I recommend that you do *not* set the > > winprocess.JOB_OBJECT_LIMIT_BREAKAWAY_OK flag unless < Windows 8. > > Actually, instead of that version check, you want to set that flag only when > _can_nest_jobs() is True. omfg. False! You only want to set that flag when _can_nest_jobs() is False! Sorry!
Assignee | ||
Comment 5•6 years ago
|
||
Thanks a lot Aaron! The flag JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE is exactly what we need here to be able to kill Firefox **after** it was restarted. I tested it with the "Restart with extensions disabled" menu option. I will still have to check if that also works for updates, but I hope it will be the same. Mind explaining me why `JOB_OBJECT_LIMIT_BREAKAWAY_OK` should only be set when `_can_nest_jobs()` is True? Why would it hurt otherwise?
Flags: needinfo?(aklotz)
Assignee | ||
Updated•6 years ago
|
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Priority: P3 → P1
Comment 6•6 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #5) > Mind explaining me why `JOB_OBJECT_LIMIT_BREAKAWAY_OK` should only be set > when `_can_nest_jobs()` is True? Why would it hurt otherwise? As I said in comment 4, I actually got that backwards the first time ;-). You only want to set it when _can_nest_jobs() is false. But the reason is this: Since our sandbox also creates jobs, we preferably want to nest the sandbox's job inside the job that you're creating here. Specifying JOB_OBJECT_LIMIT_BREAKAWAY_OK would allow sandboxed processes within the job to break off and form their own, separate job. This is actually necessary for sandboxing to work on older versions of Windows without nested job support, but on newer versions, we don't want them breaking away.
Flags: needinfo?(aklotz)
Assignee | ||
Comment 7•6 years ago
|
||
(In reply to Aaron Klotz [:aklotz] from comment #6) > As I said in comment 4, I actually got that backwards the first time ;-). > You only want to set it when _can_nest_jobs() is false. Ups, I missed the 2nd comment when replying. I will make sure to use it the right way. > Since our sandbox also creates jobs, we preferably want to nest the > sandbox's job inside the job that you're creating here. Specifying > JOB_OBJECT_LIMIT_BREAKAWAY_OK would allow sandboxed processes within the job > to break off and form their own, separate job. This is actually necessary > for sandboxing to work on older versions of Windows without nested job > support, but on newer versions, we don't want them breaking away. Great explanation. I will add a limited description of that in mozprocess, and the full content in the commit message if you don't mind.
Assignee | ||
Comment 8•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=05c882dac1d1fb96647209bc6d4044ba513e1d48
Assignee | ||
Comment 9•6 years ago
|
||
Allow mozprocess to track and kill processes on Windows, even when they got restarted. Such processes are still part of the job object, but unless the "JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE" flag is set, they aren't getting closed. Since our sandbox also creates jobs, we preferably want to nest the sandbox's job inside the job as created by mozprocess. Specifying "JOB_OBJECT_LIMIT_BREAKAWAY_OK" would allow sandboxed processes within the job to break off and form their own, separate job. This is actually necessary for sandboxing to work on older versions of Windows without nested job support, but on newer versions, we don't want them breaking away.
Attachment #9010531 -
Flags: review?(aklotz)
Assignee | ||
Comment 10•6 years ago
|
||
So the patch is actually not complete yet. What we would need in terms of Marionette is that mozprocess would also know about the new and detached process id. In case of Posix we simply update it with the process id which Marionette gets via Firefox: https://dxr.mozilla.org/mozilla-central/rev/a9e339b3e5d8dc3b108d3dc8b85f72b2235fafb2/testing/mozbase/mozprocess/mozprocess/processhandler.py#910 But how can we overwrite the process id (`self.pid`) on Windows? Aaron, is there a simple way of doing that? Or any alternative?
Assignee | ||
Comment 11•6 years ago
|
||
Comment on attachment 9010531 [details] [diff] [review] [mozprocess] Track and kill detached processes on Windows Review of attachment 9010531 [details] [diff] [review]: ----------------------------------------------------------------- Lets try Geoff for the review, so that we can get this important patch landed.
Attachment #9010531 -
Flags: review?(aklotz) → review?(gbrown)
Assignee | ||
Comment 12•6 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #10) > So the patch is actually not complete yet. What we would need in terms of > Marionette is that mozprocess would also know about the new and detached > process id. In case of Posix we simply update it with the process id which > Marionette gets via Firefox: > > https://dxr.mozilla.org/mozilla-central/rev/ > a9e339b3e5d8dc3b108d3dc8b85f72b2235fafb2/testing/mozbase/mozprocess/ > mozprocess/processhandler.py#910 > > But how can we overwrite the process id (`self.pid`) on Windows? Aaron, is > there a simple way of doing that? Or any alternative? I moved this extra part to bug 1493796.
No longer blocks: 1420372
Comment 13•6 years ago
|
||
Comment on attachment 9010531 [details] [diff] [review] [mozprocess] Track and kill detached processes on Windows Review of attachment 9010531 [details] [diff] [review]: ----------------------------------------------------------------- This seems reasonable and consistent with the discussion here.
Attachment #9010531 -
Flags: review?(gbrown) → review+
Comment 14•6 years ago
|
||
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/afe0beb5f125 [mozprocess] Track and kill detached processes on Windows. r=gbrown
Comment 15•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/afe0beb5f125
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-firefox64:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Assignee | ||
Comment 16•6 years ago
|
||
While we can still uplift to beta, I would propose to do that for stability of this test-only patch. Thanks.
status-firefox63:
--- → affected
Whiteboard: [checkin-needed-beta]
Comment 17•6 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/570f98364ad3
Whiteboard: [checkin-needed-beta]
You need to log in
before you can comment on or make changes to this bug.
Description
•