Open Bug 1493796 Opened Last year Updated 6 months ago

[mozprocess] Allow to update process id on Windows if process has been restarted

Categories

(Testing :: Mozbase, enhancement, P3)

enhancement

Tracking

(Not tracked)

People

(Reporter: whimboo, Unassigned)

References

(Blocks 4 open bugs)

Details

Attachments

(1 file)

(In reply to Henrik Skupin (:whimboo) from bug 1438830 comment #10)
> So the patch is actually not complete yet. What we would need in terms of
> Marionette is that mozprocess would also know about the new and detached
> process id. In case of Posix we simply update it with the process id which
> Marionette gets via Firefox:
> 
> https://dxr.mozilla.org/mozilla-central/rev/
> a9e339b3e5d8dc3b108d3dc8b85f72b2235fafb2/testing/mozbase/mozprocess/
> mozprocess/processhandler.py#910
> 
> But how can we overwrite the process id (`self.pid`) on Windows? Aaron, is
> there a simple way of doing that? Or any alternative?
Aaron, maybe you have an idea? Not sure if we would have to recreate the whole job object, or if there is a simpler solution. Thanks.
Flags: needinfo?(aklotz)
Priority: -- → P3
So there is actually a real problem here. Even we can track the process after a restart of Firefox via the job, a call to `GetExitCodeProcess()` returns 0. With that information we currently assume the process has exited, and consumers are misbehaving. See the perma failures on beta for bug 1496759 and bug 1497062.

This all started to happen with the following commit on bug 1433905, which assumes that the exit code is always correct:
https://hg.mozilla.org/integration/autoland/rev/8793e332890e

I think this happens because the handle would still refer to the old pid of the former process id, but not for the Firefox process after the restart! As such it thinks the process doesn't exist anymore.

To temporarily unblock us from having those failures on central and beta, I will backout the above commit, and disable the affected mozprocess unit test.

Aaron, I hope that you will have a bit of time soon to help us getting this right. If not please also let me know, and I have to dig into all that myself. But I would appreciate at least a direction. Thanks!
Blocks: 1497062, 1496759
Originally landed as changeset 8793e332890e via bug 1433905 the
patch caused a regression because GetExitCodeProcess() returns
0 for an inside of Firefox restarted process.

It can be relanded once the process id of the job object can
successfully be tracked.
I messed-up the patch with a wrong commit from bug 1397612. Given that this one has been reverted now, here another try build:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=748a483a4fb10fbc32337ec0d0d394eec6ebbf0c
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9782e0b1657d
[mozprocess] Revert poll() behavior on Windows due to regression. r=gbrown
Status: NEW → ASSIGNED
Keywords: leave-open
Priority: P3 → P2
Sorry for the delay here, I am really slammed at the moment.

I don't know if this is a complete solution, but it is possible to receive notifications from a job object by calling SetInformationJobObject with the JobObjectAssociateCompletionPortInformation information class, and registering an I/O completion port on that job object.

This would allow you to poll for the JOB_OBJECT_MSG_NEW_PROCESS notification, which tells you when a new process has been created within the job. I would think that a browser restart on Windows would result in the newly created browser process still being within the same job, so I *think* that could be enough?

Since this does require you to set up an I/O completion port, it will probably require some extra ctypes work to make happen, but I think it might be your best bet.
Flags: needinfo?(aklotz)
(In reply to Aaron Klotz [:aklotz] from comment #10)
> I would think that a browser restart on Windows would result in the
> newly created browser process still being within the same job, so I *think*
> that could be enough?

I guess the other catch is that you will receive that notification for *all* new processes created within the job, so you also somehow need to filter out which one is the browser.
Duplicate of this bug: 1389095

The leave-open keyword is there and there is no activity for 6 months.
:whimboo, maybe it's time to close this bug?

Flags: needinfo?(hskupin)

No, there is still more work to do here, but I don't have the time doing it at the moment. If anyone has interest feel free to do it.

Assignee: hskupin → nobody
Status: ASSIGNED → NEW
Type: defect → enhancement
Flags: needinfo?(hskupin)
Keywords: leave-open
Priority: P2 → P3
You need to log in before you can comment on or make changes to this bug.