Open Bug 1449482 Opened 7 years ago Updated 2 years ago

Rewrite serve.py to use multiprocessing.Queue instead of multiprocess.Pipe

Categories

(Testing :: Marionette Client and Harness, enhancement, P3)

enhancement

Tracking

(Not tracked)

People

(Reporter: whimboo, Unassigned)

References

Details

This can be seen here: https://treeherder.mozilla.org/logviewer.html#?job_id=170701741&repo=mozilla-inbound&lineNumber=750-774 Whenever an exception gets raised in the FixtureServer code, we hang forever. I can easily reproduce it locally. This is one other reason why we see bug 1391545 in automation.
So the problem here happens in `get_url()`: https://dxr.mozilla.org/mozilla-central/rev/b906009d875d1f5d29b0d1252cdb43a9b1a5889c/testing/marionette/harness/marionette_harness/runner/serve.py#159 The parent process sends the request to the child, which didn't startup correctly due to the exception in the `init_func`, and Python's multiprocessing module already called `os._exit()` on it. It means that the call to `recv` from the parent hangs and never returns: https://dxr.mozilla.org/mozilla-central/rev/b906009d875d1f5d29b0d1252cdb43a9b1a5889c/testing/marionette/harness/marionette_harness/runner/serve.py#54 Given by the docs I would assume that a `EOFError` exception is raised, but that doesn't happen and we just hang: https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Connection.recv Andreas, do you have an idea what's going on here? Maybe it is a bug in the multiprocessing module? If we would call the `init_func` earlier to ensure that the child is running properly a server instance before sending any request to it, it would fix the hang, and cause an immediate abort.
Flags: needinfo?(ato)
Btw I cannot dig into `Connection.recv` because it is part of the compiled module: lib-dynload/_multiprocessing.so
See my patch for init_func in https://bugzilla.mozilla.org/show_bug.cgi?id=1321517 and in particular by comment https://bugzilla.mozilla.org/show_bug.cgi?id=1321517#c22. The patch doesn’t fix an exception occurring at an arbitrary place in ServerProxy, but it does address the immediate startup problem. I think in order to solve this in a good way, serve.py should be made not to depend on rolling its own IPC system but instead make use of a multiprocessing.Queue and other best practices for multiprocessing in Python.
Flags: needinfo?(ato)
Ok, that fix looks fine. I will update this bug's summary for the ultimate goal in using multiprocessing.Queue and put it into the backlog. Thanks.
Assignee: hskupin → nobody
Status: ASSIGNED → NEW
Priority: P1 → P3
Summary: Infinite hang of Marionette when FixtureServer code raises an exception → Rewrite serve.py to use multiprocessing.Queue instead of multiprocess.Pipe
Severity: normal → S3
Product: Testing → Remote Protocol
Component: Marionette → Marionette Client and Harness
Product: Remote Protocol → Testing
You need to log in before you can comment on or make changes to this bug.