Closed Bug 1471908 Opened 6 years ago Closed 6 years ago

'OSError' when raptor shutsdown the browser on win32/64

Categories

(Testing :: Raptor, defect)

Version 3
defect
Not set
normal

Tracking

(firefox63 fixed)

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: rwood, Assigned: jmaher)

References

Details

Attachments

(1 file)

When raptor is finished and shutsdown Firefox on Windows, there is a strange mozprocess error.

This is where raptor starts up Firefox via mozrunner:

https://searchfox.org/mozilla-central/rev/d2966246905102b36ef5221b0e3cbccf7ea15a86/testing/raptor/raptor/raptor.py#135

This is where raptor kills Firefox:

https://searchfox.org/mozilla-central/rev/d2966246905102b36ef5221b0e3cbccf7ea15a86/testing/raptor/raptor/control_server.py#113

And here is the stack trace from Win7 on production (same for Win10) with MOZPROCESS_DEBUG=1:

22:35:14     INFO -  raptor-control-server received webext_status: __raptor_shutdownBrowser
22:35:14     INFO -  raptor-control-server shutting down browser (pid: 2648)
22:35:14    ERROR -  PID 2648 | JavaScript error: resource://gre/modules/BrowserUtils.jsm, line 428: TypeError: docShell is null
22:35:14     INFO -  PID 2648 | [Child 6396, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
22:38:50     INFO -  DBG::MOZPROC PID:2648 (Thread-4) | process id 6396 exited normally
22:38:50     INFO -  DBG::MOZPROC PID:2648 (Thread-5) | calling TerminateJobObject
22:38:50     INFO -  DBG::MOZPROC PID:2648 (Thread-5) | waiting with IO completion port
22:38:50     INFO -  DBG::MOZPROC PID:2648 (MainThread) | waiting with IO completion port
22:38:50     INFO -  DBG::MOZPROC PID:2648 (Thread-4) | job object msg active processes zero
22:38:50     INFO -  DBG::MOZPROC PID:2648 (Thread-5) | received 'FINISHED' from _procmgrthread
22:38:50    ERROR -  Traceback (most recent call last):
22:38:50     INFO -    File "C:\Users\task_1530138066\build\venv\lib\site-packages\mozprocess\processhandler.py", line 521, in _wait
22:38:50     INFO -      self.MAX_PROCESS_KILL_DELAY)
22:38:50     INFO -    File "c:\mozilla-build\python\Lib\Queue.py", line 176, in get
22:38:50     INFO -      raise Empty
22:38:50     INFO -  Empty
22:38:50    ERROR -  Traceback (most recent call last):
22:38:50     INFO -    File "C:\Users\task_1530138066\build\tests\raptor\raptor\raptor.py", line 223, in <module>
22:38:50     INFO -      main()
22:38:50     INFO -    File "C:\Users\task_1530138066\build\tests\raptor\raptor\raptor.py", line 211, in main
22:38:50     INFO -      raptor.run_test(next_test)
22:38:50     INFO -    File "C:\Users\task_1530138066\build\tests\raptor\raptor\raptor.py", line 144, in run_test
22:38:50     INFO -      self.runner.wait(timeout)
22:38:50     INFO -    File "C:\Users\task_1530138066\build\venv\lib\site-packages\mozrunner\base\runner.py", line 161, in wait
22:38:50     INFO -      self.process_handler.wait(timeout)
22:38:50     INFO -    File "C:\Users\task_1530138066\build\venv\lib\site-packages\mozprocess\processhandler.py", line 863, in wait
22:38:50     INFO -      self.returncode = self.proc.wait()
22:38:50     INFO -    File "C:\Users\task_1530138066\build\venv\lib\site-packages\mozprocess\processhandler.py", line 221, in wait
22:38:50     INFO -      self.returncode = self._wait()
22:38:50     INFO -    File "C:\Users\task_1530138066\build\venv\lib\site-packages\mozprocess\processhandler.py", line 527, in _wait
22:38:50     INFO -      raise OSError("IO Completion Port failed to signal process shutdown")
22:38:50     INFO -  OSError: IO Completion Port failed to signal process shutdown
DBG::MOZPROC PID:6204 (Thread-2) | process id 6204 exited normally
WARNING | IO Completion Port failed to signal process shutdown
Parent process 6204 exited with children alive:
PIDS: 2888, 2776, 4412
Attempting to kill them, but no guarantee of success
DBG::MOZPROC PID:6204 (Thread-2) | calling TerminateJobObject
DBG::MOZPROC PID:6204 (Thread-2) | waiting without IO completion port
DBG::MOZPROC PID:6204 (Thread-2) | NOT USING JOB OBJECTS!!!
22:41:55    ERROR - Return code: 1
Hey Henrik, do you think this could possibly be somehow related to Bug 1433905?
Flags: needinfo?(hskupin)
Can you please point me to the Raptor code which handles the shutdown of the process? Hard to tell right now, and I haven't had the time yet to dig into the IO completion port issues.
Flags: needinfo?(hskupin)
(In reply to Henrik Skupin (:whimboo) from comment #2)

> Can you please point me to the Raptor code which handles the shutdown of the
> process? Hard to tell right now, and I haven't had the time yet to dig into
> the IO completion port issues.

Yessir, thanks:

https://searchfox.org/mozilla-central/rev/d2966246905102b36ef5221b0e3cbccf7ea15a86/testing/raptor/raptor/control_server.py#113
You wait only 5 seconds for the browser to quit? Well, this is something you will have to change. We face a lot of shutdowns of Firefox which are longer than 15s, and in some cases even 65s which then get killed by the background hang monitor. So I would suggest that you wait at maximum 75s for letting Firefox kill itself in the worst scenario. This is what we also do with Marionette.

So currently you trigger a browser quit, but the script is most likely killing the browser during it is shutting down. Not sure which side-effects this actually has for content processes. And if this is the output of mozprocess which you are seeing here.

Try to increase the timeout and if it is still happening let me know and provide another log.
Not sure how often this failed, but the try looks fine, also with the additional 10 jobs which I triggered for you.
(In reply to Henrik Skupin (:whimboo) from comment #6)
> Not sure how often this failed, but the try looks fine, also with the
> additional 10 jobs which I triggered for you.

Hey, thanks - raptor is tier 3 - the green 'rap' tests are only the unit tests. When you turn on tier 3 you'll see the failures:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9632a14c8f13065a05221ca7bf9ab25bcf5db24c&filter-tier=1&filter-tier=2&filter-tier=3

Unfortunately extending the timeout didn't seem to help, same OSError (it is intermittent). I just did another push to try with same timeout=75 but with MOZPROCESS_DEBUG=1:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=80e2c68eca28e9c52e068da2814b3a9ea3874cf3&filter-tier=1&filter-tier=2&filter-tier=3
Looking into this further, mitmproxy playback is still happening after we try to shutdown the browser. There are a bunch of Firefox child processes still running that aren't exiting when we try to shutdown the browser - perhaps these have to do with mitmproxy still running and posting to the proxy but not sure, looking into this further. Will try stopping mitmproxy playback before shutting down the browser and see if that helps.
So how do you instantiate the ProcessHandler? Do you follow child processes or not? If not, it would explain the failure. Also how do you instantiate the browser shutdown via the extension?
Wait, :jmaher reminded me this issue happens with all the tests not just tp6, and the other tests don't use mitmproxy. Next guess is maybe some child processes left over from receiving/posting with control server... funny how this only happens on Windows.
(In reply to Henrik Skupin (:whimboo) from comment #11)
> So how do you instantiate the ProcessHandler? Do you follow child processes
> or not? If not, it would explain the failure. Also how do you instantiate
> the browser shutdown via the extension?

The browser is not shutdown via the extension (not possible) the extension sends a message to the raptor control server and it terminates the browser process.

The browser process is instantiated here via mozrunner, I have no idea re: following child processes, good question...

https://searchfox.org/mozilla-central/rev/d2966246905102b36ef5221b0e3cbccf7ea15a86/testing/raptor/raptor/raptor.py#135
we should also look carefully at what other harnesses do- mochitest, web-platform, talos; these all are not subject to this, but if they all shut down the browser from inside the addon/extension, then we need to solve that.  It would be good to find a toolchain that terminates the browser from python and then see if we can make raptor work like this.
> (In reply to Henrik Skupin (:whimboo) from comment #11)

> > So how do you instantiate the ProcessHandler? Do you follow child processes
> > or not? If not, it would explain the failure. Also how do you instantiate
> > the browser shutdown via the extension?

ignore_children is False by default (just for fun I tried setting ignore_chilren=True in the process args - when the runner instantiates the process [1] and then the browser doesn't shutdown at all on OSX.

[1] https://searchfox.org/mozilla-central/rev/d2966246905102b36ef5221b0e3cbccf7ea15a86/testing/raptor/raptor/raptor.py#72
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #14)
> we should also look carefully at what other harnesses do- mochitest,
> web-platform, talos; these all are not subject to this, but if they all shut
> down the browser from inside the addon/extension, then we need to solve
> that.  It would be good to find a toolchain that terminates the browser from
> python and then see if we can make raptor work like this.

Yeah I'll have a look at other frameworks, there must be others that also use mozrunner and mozprocess to start and stop the browser externally.
Also note this is only happening on Windows
Summary: 'OSError' when raptor shutsdown the browser → 'OSError' when raptor shutsdown the browser on win32/64
Please note that other harnesses like mochitest are not using a webextension but are still a legacy extension which has extended privileges including using the Gecko API.

I wonder if you could use Marinoette to start/stop the browser and to install the raptor extension. Marionette can safely shutdown the browser.
Blocks: 1449180
raptor has two calls to proc.wait() and this isn't needed.  We can monitor the browser_proc and timeout if we go too long.  Also this fixes the timeout that we use since it was never used to timeout the browser process.
Comment on attachment 8990025 [details]
Bug 1471908 - 'OSError' when raptor shutsdown the browser on win32/64. r=rwood

Robert Wood [:rwood] has approved the revision.

https://phabricator.services.mozilla.com/D1972
Attachment #8990025 - Flags: review+
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1ca992fccdc7
'OSError' when raptor shutsdown the browser on win32/64. r=rwood
https://hg.mozilla.org/mozilla-central/rev/1ca992fccdc7
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Assignee: nobody → jmaher
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: