[mozprocess] ProcessHandler.poll() returns invalid process status when process quit itself

RESOLVED FIXED in Firefox 64

Status

enhancement
P1
normal
RESOLVED FIXED
2 years ago
9 months ago

People

(Reporter: whimboo, Assigned: whimboo)

Tracking

(Blocks 3 bugs)

Version 3
mozilla64
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox64 fixed)

Details

Attachments

(6 attachments, 6 obsolete attachments)

46 bytes, text/x-phabricator-request
gbrown
: review+
Details | Review
46 bytes, text/x-phabricator-request
gbrown
: review+
Details | Review
46 bytes, text/x-phabricator-request
gbrown
: review+
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
46 bytes, text/x-phabricator-request
Details | Review
While working on bug 1433873 I found that the poll() method on the ProcessHandler class returns an invalid value for the process status when the process has quit itself.

The reason are those two lines:
https://dxr.mozilla.org/mozilla-central/rev/e22e2c9eb81686e958a9448ea3d1e8cd766743e2/testing/mozbase/mozprocess/mozprocess/processhandler.py#811-812

Those have been added about 4 years ago by myself on bug 965326.

Basically this doesn't work because our `Process` class as inherited from `subprocess.Popen` always has this property set.

So once the process got started the poll method of the `ProcessHandler` will return None, even when the application already quit.
The problem here simply is that `poll` never tries to join the output reader thread. So as long as `wait` is not called, `poll` will always return None. Compared to `wait` it should not try to join the reader thread in a loop, but at least do it once.

First try build looks pretty good:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a2cae4bbed129e8cc516d7b5349d3e8ddc560193

I finish the patch and have it up on mozreview soon.
Somehow the Marionette unit tests on Android fail now:

https://treeherder.mozilla.org/logviewer.html#?job_id=159146838&repo=try

[task 2018-01-29T21:07:26.311Z] 21:07:26     INFO -  {"thread": "MainThread", "level": "ERROR", "pid": 1006, "source": "Marionette test runner", "time": 1517260046310, "action": "log", "message": "Failure during harness execution", "stack": "Traceback (most recent call last):\n\n  File \"/builds/worker/workspace/build/tests/marionette/harness/marionette_harness/runtests.py\", line 94, in cli\n    failed = harness_instance.run()\n\n  File \"/builds/worker/workspace/build/tests/marionette/harness/marionette_harness/runtests.py\", line 74, in run\n    runner.run_tests(tests)\n\n  File \"/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_harness/runner/base.py\", line 863, in run_tests\n    self.fixture_servers = self.start_fixture_servers()\n\n  File \"/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_harness/runner/base.py\", line 964, in start_fixture_servers\n    if self.appName == \"fennec\":\n\n  File \"/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_harness/runner/base.py\", line 682, in appName\n    self._appName = self.capabilities.get('browserName')\n\n  File \"/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_harness/runner/base.py\", line 672, in capabilities\n    self.marionette.start_session()\n\n  File \"/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/decorators.py\", line 26, in _\n    return func(*args, **kwargs)\n\n  File \"/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py\", line 1211, in start_session\n    self.start_binary(timeout)\n\n  File \"/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py\", line 635, in start_binary\n    self.instance.start()\n\n  File \"/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/geckoinstance.py\", line 415, in start\n    self._update_profile(self.profile)\n\n  File \"/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/geckoinstance.py\", line 189, in _update_profile\n    raise errors.MarionetteException(\"The current profile can only be updated \"\n\nMarionetteException: The current profile can only be updated when the instance is not running\n"}

It's something I will have to check tomorrow.
Here the better formatted traceback from Python:

> Traceback (most recent call last):
> 
>   File \"/builds/worker/workspace/build/tests/marionette/harness/marionette_harness/runtests.py\", line 94, in cli
>     failed = harness_instance.run()
> 
>   File \"/builds/worker/workspace/build/tests/marionette/harness/marionette_harness/runtests.py\", line 74, in run
>     runner.run_tests(tests)
> 
>   File \"/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_harness/runner/base.py\", line 863, in run_tests
>     self.fixture_servers = self.start_fixture_servers()
> 
>   File \"/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_harness/runner/base.py\", line 964, in start_fixture_servers
>     if self.appName == \"fennec\":
> 
>   File \"/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_harness/runner/base.py\", line 682, in appName
>     self._appName = self.capabilities.get('browserName')
> 
>   File \"/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_harness/runner/base.py\", line 672, in capabilities
>     self.marionette.start_session()
> 
>   File \"/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/decorators.py\", line 26, in _
>     return func(*args, **kwargs)
> 
>   File \"/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py\", line 1211, in start_session
>     self.start_binary(timeout)
> 
>   File \"/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py\", line 635, in start_binary
>     self.instance.start()
> 
>   File \"/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/geckoinstance.py\", line 415, in start
>     self._update_profile(self.profile)
> 
>   File \"/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/geckoinstance.py\", line 189, in _update_profile
>     raise errors.MarionetteException(\"The current profile can only be updated \"
> 
> MarionetteException: The current profile can only be updated when the instance is not running

What I don't understand yet is how this is related to this patch but not to bug 1429759, which landed a couple of days ago.
Actually the problem here is that for Fennec Marionette uses `adb` to install and execute the binary. Sadly `adb` forks itself and leaves behind a zombie process. When Marionette checks for the process status via the mozrunner.returncode (uses mozprocess.poll() internally) we join the reader threads because no more output is coming from this zombie process, and return 0 for a normal exit. But that's not true, because adb is still running.

I'm not sure how to continue here for the moment. I will file a mozprocess bug in the hope that we can get better tracking support for forked (detached) processes.
The command as used on Android to start the application is:

/builds/worker/workspace/build/android-sdk-linux/platform-tools/adb -s emulator-5554 shell am start -a android.activity.MAIN -n org.mozilla.fennec_aurora/org.mozilla.gecko.BrowserApp --es args '-no-remote -profile /storage/sdcard/tests/profile -marionette' --es env0 MOZ_CRASHREPORTER=1 --es env1 R_LOG_VERBOSE=1 --es env2 MOZ_HIDE_RESULTS_TABLE=1 --es env3 MOZ_LOG=signaling:3,mtransport:4,DataChannel:4,jsep:4,MediaPipelineFactory:4 --es env4 MOZ_CRASHREPORTER_SHUTDOWN=1 --es env5 R_LOG_DESTINATION=stderr --es env6 MOZ_CRASHREPORTER_NO_REPORT=1 --es env7 NO_EM_RESTART=1 --es env8 MOZ_PROCESS_LOG=/tmp/tmpbGGTxNpidlog --es env9 R_LOG_LEVEL=6

Starting the application actually happens in launch_application:

https://dxr.mozilla.org/mozilla-central/rev/0d806b3230fe4767fa70cdee57db87d1e9a5ba49/testing/mozbase/mozdevice/mozdevice/adb_android.py#304

Later in Marionette we try to check the status of Fennec via `wait()` in my patch. But there is no override for that method in device.py of mozrunner, and as such we check the process status of adb on the host. This is bad because adb runs as daemon at this point, and the exit code is 0!

Only `is_running` would work because it has custom code:

https://dxr.mozilla.org/mozilla-central/rev/0d806b3230fe4767fa70cdee57db87d1e9a5ba49/testing/mozbase/mozrunner/mozrunner/base/device.py#124

So I really wonder why all the time both `wait()` and `poll()` in mozrunner specifically for device instances do not have custom code, and falsely evaluate the host process (adb) for the exit status.

Bob and Geoff, do you have some input here?
Flags: needinfo?(gbrown)
Flags: needinfo?(bob)
Other than Marionette, all the other emulator unit tests use different code for this sort of thing. Notably, they do not use mozrunner/base/device.py. Also, they use devicemanagerADB.py instead of adb_android.py, although I suspect adb_android.py is superior to devicemanager now, and I've been contemplating porting from devicemanager to adb_android.py.

In general, in devicemanager, we might wait for an adb command to complete and might even check for its return code, but that's just to catch things like a crash in adb or a missing adb binary. Usually, we also check the output from the adb command.

Launching Firefox is even trickier. "am start" launches Firefox, but doesn't wait for it to complete. "am start -W" will wait for the application to launch, but I'm not sure if that really waits for Firefox to complete. In the past, we waited for Firefox to complete by polling "adb shell ps", but the mobile dev team had some issue with this. 

Now, we check that Firefox is the top (foreground) activity -- if it isn't, tests are complete and our wait is over:

https://dxr.mozilla.org/mozilla-central/rev/0d806b3230fe4767fa70cdee57db87d1e9a5ba49/build/mobile/remoteautomation.py#375-412
Flags: needinfo?(gbrown)
(In reply to Geoff Brown [:gbrown] from comment #9)
> Other than Marionette, all the other emulator unit tests use different code
> for this sort of thing. Notably, they do not use mozrunner/base/device.py.
> Also, they use devicemanagerADB.py instead of adb_android.py, although I
> suspect adb_android.py is superior to devicemanager now, and I've been
> contemplating porting from devicemanager to adb_android.py.

Oh, I was wrong. FennecRunner doesn't use adb_android and as it looks like only parts from devicemanagerADB, but mostly DroidADB (`self.dm_class = DroidADB`)! Wow, why do we have so many different implementations for Android and adb?!

Also the command property seems to get superseeded by mozrunner itself:
https://dxr.mozilla.org/mozilla-central/rev/0d806b3230fe4767fa70cdee57db87d1e9a5ba49/testing/mozbase/mozrunner/mozrunner/base/device.py#171

So I assume I should have a look at device.py and DroidADB in what could be possible to improve.

> Launching Firefox is even trickier. "am start" launches Firefox, but doesn't
> wait for it to complete. "am start -W" will wait for the application to
> launch, but I'm not sure if that really waits for Firefox to complete. In
> the past, we waited for Firefox to complete by polling "adb shell ps", but
> the mobile dev team had some issue with this. 

Hm, it would be good to know what -W is actually doing. If this causes adb not to start itself as daemon that would be good. But then it really would have to wait until the application has quit.

> Now, we check that Firefox is the top (foreground) activity -- if it isn't,
> tests are complete and our wait is over:
> 
> https://dxr.mozilla.org/mozilla-central/rev/
> 0d806b3230fe4767fa70cdee57db87d1e9a5ba49/build/mobile/remoteautomation.
> py#375-412

Might not be ideal, a command like the following would also check if the app is running in the background (some tests might want to do it in the future) - but well, right now it should be enough.

> adb shell dumpsys activity activities | grep 'org.mozilla.firefox_beta/org.mozilla.gecko.BrowserApp' | grep Hist

Ok, so it means I will try the following now:

1) Use -W for starting the activity - if it works fine
2) Override the `poll` and `wait` methods for FennecRunner to return the remote process status
(In reply to Henrik Skupin (:whimboo) from comment #10)
> Oh, I was wrong. FennecRunner doesn't use adb_android and as it looks like
> only parts from devicemanagerADB, but mostly DroidADB (`self.dm_class =
> DroidADB`)! Wow, why do we have so many different implementations for
> Android and adb?!

While digging more into this I have seen that DroidADB is actually a subclass from DeviceManagerADB and DroidMixin.

> 1) Use -W for starting the activity - if it works fine

I checked that now, and as we already documented in the code -W only waits until the activity has been started. So it doesn't help me here.

> 2) Override the `poll` and `wait` methods for FennecRunner to return the
> remote process status

Looks like wrappers for both methods are necessary, which would rely on the `getTopActivity()`, and return an appropriate exit code.
Flags: needinfo?(bob)
I triggered a try build now that the transition to adb has been finished. It looks all fine on Linux, MacOS, and Android. But there are perma failures for Windows:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9a4572d2d102e8fff0f7490023f830aea7fcb855

> AssertionError: Positive returncode expected, got "None"

After some investigation I noticed that we have a bogus process still running detection for Windows:

>     # If we have a handle, the process is alive
>     if isWin and getattr(self, '_handle', None):
>         return None

This doesn't make sense and indeed will cause us to always return None if the process got killed by the system, or crashed.
The latest changes look promising. No failure yet:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a1e4c3114c69b5304e552c561c6f85804a7537a3

I added a couple of more tests for mozprocess to make sure that this change doesn't regress something.

A full try build is still outstanding and I will trigger it after uploading the most recent changes.
The test browser\components\migration\tests\marionette\test_refresh_firefox.py fails on Windows 10 64bit opt only so far. Not sure about Windows 7 because the tests aren't done yet.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a1e4c3114c69b5304e552c561c6f85804a7537a3&selectedJob=180735820

> TEST-UNEXPECTED-ERROR | browser\components\migration\tests\marionette\test_refresh_firefox.py TestFirefoxRefresh.testReset | IOError: Process has been unexpectedly closed (Exit code: 0) (Reason: Requested restart of the application was aborted)

Given that all the other still-enabled Marionette restart tests work, I wonder if that is a problem caused by the test code. I will check that tomorrow.
So the failure with test_refresh_firefox.py is troublesome and I don't know why the Firefox exists in-between the restart. Due to my change on this bug `raise_for_port()` will return early due to the exit code of 0. I don't know why Firefox exists the io completion port in this case, but I feel it's too much work for this bug. As such I created bug 1465715 and will re-enable the early return then.

With the above change I can still see a failure in cleaning up the files in test_refresh_firefox.py. I hope that isn't too hard to fix.
So it's getting more complicated but finally I might know why it's failing. The test `test_refresh_firefox.exe` triggers a refresh of the user profile, and while Firefox is doing that I can see the following in the log:

> IO Completion Port unexpectedly closed

It means the IO completion port mozprocess is using is gone. Given the exit code of `0` for the tracked Fireefox process my assumption would be that Firefox closes, or forks itself in a very weird way that it escapes the io completion port. As of now mozprocess cannot handle detached processes on Windows yet, and bug 1438830 tracks that work.

Gijs or Matt, I'm fairly sure one of you will be able to shed some lights on that problem and could point me to the code which is related to the strange restart. I would like to understand what it does, so that I can create a similar binary to be used for testing of mozprocess. Thanks.
Depends on: 1438830
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(MattN+bmo)
(In reply to Henrik Skupin (:whimboo) from comment #21)
> So it's getting more complicated but finally I might know why it's failing.
> The test `test_refresh_firefox.exe`

Is this just a typo? This is a python test...

> triggers a refresh of the user profile,
> and while Firefox is doing that I can see the following in the log:
> 
> > IO Completion Port unexpectedly closed
> 
> It means the IO completion port mozprocess is using is gone. Given the exit
> code of `0` for the tracked Fireefox process my assumption would be that
> Firefox closes, or forks itself in a very weird way that it escapes the io
> completion port. As of now mozprocess cannot handle detached processes on
> Windows yet, and bug 1438830 tracks that work.
> 
> Gijs or Matt, I'm fairly sure one of you will be able to shed some lights on
> that problem and could point me to the code which is related to the strange
> restart. I would like to understand what it does, so that I can create a
> similar binary to be used for testing of mozprocess. Thanks.

I'm not really sure what you need to know here. The reset is at:

https://searchfox.org/mozilla-central/source/browser/components/migration/tests/marionette/test_refresh_firefox.py#502-530

which does:

        self.marionette.restart(clean=False, in_app=True)

after setting a bunch of environment variables. Does that help? If not, please be more specific about what you need. I'm afraid I'm not personally familiar with how Marionette implements the .restart() and how that'd relate to what you're doing with mozprocess.
Flags: needinfo?(hskupin)
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(MattN+bmo)
Gijs, what I basically want to know is not how Marionette handles restarts, because I completely implemented that, but what Firefox actually does here. The way how it restarts itself seem to be different to usual restarts which do not trigger those io completion port issues.
Flags: needinfo?(hskupin) → needinfo?(gijskruitbosch+bugs)
(In reply to Henrik Skupin (:whimboo) from comment #23)
> Gijs, what I basically want to know is not how Marionette handles restarts,
> because I completely implemented that, but what Firefox actually does here.
> The way how it restarts itself seem to be different to usual restarts which
> do not trigger those io completion port issues.

What is the "IO Completion Port"?

We detect the env vars here:

https://dxr.mozilla.org/mozilla-central/rev/d8f180ab74921fd07a66d6868914a48e5f9ea797/toolkit/xre/nsAppRunner.cpp#2261-2290

and then switch to a new profile in

https://dxr.mozilla.org/mozilla-central/rev/d8f180ab74921fd07a66d6868914a48e5f9ea797/toolkit/xre/nsAppRunner.cpp#2314-2338

Does that help?
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to :Gijs (he/him) from comment #24)
> What is the "IO Completion Port"?

It's a feature on Windows. Details can be found at https://msdn.microsoft.com/en-us/library/windows/desktop/aa365198(v=vs.85).aspx.

> https://dxr.mozilla.org/mozilla-central/rev/
> d8f180ab74921fd07a66d6868914a48e5f9ea797/toolkit/xre/nsAppRunner.cpp#2261-
> 2290
> 
> https://dxr.mozilla.org/mozilla-central/rev/
> d8f180ab74921fd07a66d6868914a48e5f9ea797/toolkit/xre/nsAppRunner.cpp#2314-
> 2338

Both the links do not actually show where the restart of Firefox is triggered, and how this is done. Is that a call to `Services.startup.quit()` and somewhere hidden in one of the called methods?
Flags: needinfo?(gijskruitbosch+bugs)
LaunchChild may be worth looking at as it's where the actual restart happens IIUC.
https://dxr.mozilla.org/mozilla-central/rev/d8f180ab74921fd07a66d6868914a48e5f9ea797/toolkit/xre/nsAppRunner.cpp#1815-1816,1845,1852-1858

I also don't really know how I can help since all the relevant code is in the link from comment 22… you can check if the restarts arguments or environment variables change the nsAppRunner code path.
(In reply to Henrik Skupin (:whimboo) from comment #25)
> (In reply to :Gijs (he/him) from comment #24)
> > What is the "IO Completion Port"?
> 
> It's a feature on Windows. Details can be found at
> https://msdn.microsoft.com/en-us/library/windows/desktop/aa365198(v=vs.85).
> aspx.

OK. What does mozprocess create these on/for ? The process? The lockfile in the profile? Something else?

> > https://dxr.mozilla.org/mozilla-central/rev/
> > d8f180ab74921fd07a66d6868914a48e5f9ea797/toolkit/xre/nsAppRunner.cpp#2261-
> > 2290
> > 
> > https://dxr.mozilla.org/mozilla-central/rev/
> > d8f180ab74921fd07a66d6868914a48e5f9ea797/toolkit/xre/nsAppRunner.cpp#2314-
> > 2338
> 
> Both the links do not actually show where the restart of Firefox is
> triggered, and how this is done. Is that a call to `Services.startup.quit()`
> and somewhere hidden in one of the called methods?

As Matt implied in comment #25, it's the call to the marionette .restart() function quoted in comment #22 which I assume eventually call startup.quit() with restart flags (but I haven't checked) - but in comment #23 you said you implemented it, so I also assume you know better than me how those restarts work and whether they do or do not call startup.quit()...

There's nothing magical here. The only times that test will restart Firefox is when it calls marionette's .restart() method. The only difference with regular restarts will be how those environment variables change the restart path, which is why I linked you to the nsAppRunner code where that happens...


Do we actually have other (unit?) tests on infra that exercise marionette's .restart() method and don't have this error?
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(hskupin)
(In reply to :Gijs (he/him) from comment #27)
> OK. What does mozprocess create these on/for ? The process? The lockfile in
> the profile? Something else?

It's for the process and nothing else as far as I understood. The code is kinda old and I haven't really touched it before. So my knowledge here is most likely equal to yours.

> As Matt implied in comment #25, it's the call to the marionette .restart()
> function quoted in comment #22 which I assume eventually call startup.quit()
> with restart flags (but I haven't checked) - but in comment #23 you said you
> implemented it, so I also assume you know better than me how those restarts
> work and whether they do or do not call startup.quit()...

Yes, it calls startup.quit() in case of in_app restarts. But I don't know what this method actually performs internally, additionally to the parameters we pass-in. So maybe the best would be to create a simplified testcase based on the parameters and environment variables which get set for a reset Firefox activity. Maybe I can reproduce it that way.

> Do we actually have other (unit?) tests on infra that exercise marionette's
> .restart() method and don't have this error?

Yes, we have a ton of in_app restart tests. But when I now checked the full log of the last try push on comment 19 I can see that lots of them are marked as skipped. I can only imagine that something has been changed in Marionette so that the following condition evaluates to true:

>        if self.marionette.session_capabilities["platformName"] != "windows_nt":
>            skip("Bug 1363368 - Wrong window handles after in_app restarts")

If that is true we would indeed not run any of them across all platforms, and it would explain why we only see it for the refresh Firefox Marionette test. Maybe when fixing this we would also see it for those unit tests.

If that is the case sorry for the noise. But thanks for all the feedback which gives me an opportunity to dive into and create a test application for mozprocess unit tests.
Flags: needinfo?(hskupin)
Now with the in_app restart tests enabled again I did another try push, and I now can also see problems with the Marionette unit tests. So I definitely have an easier starting point now.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=914fd980b4070fc7f03d7233947d1aa454420eae&selectedJob=182670952
Here a new try build given that the dependency has been fixed now:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0741572eb73882aea5016062a1ad6addf2baa6ec&selectedJob=202566667

Interesting the newly added test for poll() after os.kill() fails. I'm not sure yet why, given that os.waitpid() returns with sig as exit code, but ProcessHandler.poll() still returns None if run immediately. Using time.sleep(1) before, makes it all work.

I will have to see how to fix this race by hopefully getting away the call to sleep()
The delay here actually comes from checking for the reader thread to be alive, which gets executed each 0.02s. As such the first call to poll() will return None, unless there is a delay of at least 0.02s before.

I will solve this by simply waiting twice that period which should allow the thread to run at least once.
Blocks: 1495667
It all works fine now except for a single Marionette unit restart test, which I would like to temporary skip for opt/pgo Windows builds only. The necessary work to get it re-enabled is tracked on bug 1495667, and I will work on once bug 1433873 has been fixed.

Here the last try build:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=328703f839595e664942df5c7c645c3cb2eebd3d&selectedJob=202811804
Marionette currently doesn't handle restarts of Firefox that well,
and as such needs a fix on bug 1433873. Sadly this patch cannot be
landed until the mozprocess issues on bug 1433905 have been fixed.

This patch temporarily disables the test for opt/pgo Windows builds
only. Re-enabling is tracked via bug 1495667.
Instead of an AttributeError a RuntimeError has to be thrown if
the underlying process hasn't been created yet.

Depends on D7392
Calling "check_for_detached()" doesn't make sense if the process
hasn't been started yet, and as such has to raise a RuntimeError.

Depends on D7393
The assumption that when a handle is present for the process handler
on Windows doesn't mean that the process is still alive. It could
have already been externally killed, crashed, or closed itself.

The patch makes sure to check the process exit code, and run
clean-up steps if the process is already gone.

Depends on D7395
If the process quits externally (shutdown by itself or via kill),
the poll method still returns None, even with the process not
existent anymore.

To fix that, the poll method should at least try to join the reader
thread once before checking if it is still alive. Otherwise it will
continue to run, and never stop.

Also the attribute existence check for "returncode" on the process
instance has to be removed because this property always exists.
Instead a check for the "returncode" property of the ProcessHandler
class is necessary.

Depends on D7396
Mn tests for opt builds look good now:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e302eae80ce996d55eff600da02a1ad051df563f

But there are problems with pgo builds due to the artifact mode. Here another try build for those, which is still running at the moment:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=da14651308ed61c63c0b9d857fa9c836db0e67b5
Comment on attachment 9013546 [details]
Bug 1433905 - Temporarily disable test_loader_global_sharing.py for opt/pgo builds on Windows. r?gbrown

Geoff Brown [:gbrown] has approved the revision.
Attachment #9013546 - Flags: review+
Comment on attachment 9013547 [details]
Bug 1433905 - [mozprocess] Retrieving pid has to fail with RuntimeError if process hasn't been started yet. r?gbrown

Geoff Brown [:gbrown] has approved the revision.
Attachment #9013547 - Flags: review+
Comment on attachment 9013548 [details]
Bug 1433905 - [mozprocess] "check_for_detached()" has to raise RuntimeError if process hasn't been started yet. r?gbrown

Geoff Brown [:gbrown] has approved the revision.
Attachment #9013548 - Flags: review+
Sorry, I wasn't able to finish off these reviews today. Will make it a priority tomorrow.
Flags: needinfo?(gbrown)
Flags: needinfo?(gbrown)
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/051b1c82dac8
Temporarily disable test_loader_global_sharing.py for opt/pgo builds on Windows. r=gbrown
https://hg.mozilla.org/integration/autoland/rev/4f05d89da460
[mozprocess] Retrieving pid has to fail with RuntimeError if process hasn't been started yet. r=gbrown
https://hg.mozilla.org/integration/autoland/rev/70607539582d
[mozprocess] "check_for_detached()" has to raise RuntimeError if process hasn't been started yet. r=gbrown
https://hg.mozilla.org/integration/autoland/rev/a9f6938eccfe
[mozprocess] Fix broken path to process script for test_process_output_nonewline. r=gbrown
https://hg.mozilla.org/integration/autoland/rev/8793e332890e
[mozprocess] Existence of _handle on Windows doesn't mean the process is still alive. r=gbrown
https://hg.mozilla.org/integration/autoland/rev/21e66bfa0cc4
[mozprocess] poll() always returns None for stopped process until wait() is called. r=gbrown
You need to log in before you can comment on or make changes to this bug.