Open Bug 1906583 Opened 4 months ago Updated 7 days ago

Intermittent Mn, Wd | setup error: webdriver.error.UnknownErrorException: unknown error (500): Failed to decode response from marionette (Exiting due to channel error)

Categories

(Remote Protocol :: Marionette, defect, P5)

Unspecified
All
defect

Tracking

(Not tracked)

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Depends on 2 open bugs)

Details

(Keywords: intermittent-failure)

Filed by: nfay [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=465407101&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Zki33EktTTWgojWMUsZT3A/runs/0/artifacts/public/logs/live_backing.log


[task 2024-07-08T05:54:50.009Z] 05:54:50     INFO - PID 22699 | [Utility 24595, Main Thread] WARNING: IPC message 'PUtilityProcess::Msg_FOGData' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:551
[task 2024-07-08T05:54:50.041Z] 05:54:50     INFO - PID 22699 | Destroying context 7f32ab9a2c80 surface 0 on display 7f32b4631600
[task 2024-07-08T05:54:50.701Z] 05:54:50     INFO - PID 22699 | [Parent 24354, Main Thread] WARNING: '!top', file /builds/worker/checkouts/gecko/dom/xul/MenuBarListener.cpp:99
[task 2024-07-08T05:54:51.033Z] 05:54:51     INFO - PID 22699 | [Parent 24354, Main Thread] WARNING: NS_ENSURE_SUCCESS_VOID(rv) failed with result 0x8000FFFF (NS_ERROR_UNEXPECTED): file /builds/worker/checkouts/gecko/dom/file/uri/BlobURLProtocolHandler.cpp:407
[task 2024-07-08T05:54:51.034Z] 05:54:51     INFO - PID 22699 | [Parent 24354, Main Thread] WARNING: NS_ENSURE_TRUE(!!svc) failed: file /builds/worker/checkouts/gecko/dom/file/uri/BlobURLProtocolHandler.cpp:507
[task 2024-07-08T05:54:51.122Z] 05:54:51     INFO - PID 22699 | 1720418091121	geckodriver::browser	DEBUG	Browser process stopped: exit status: 0
[task 2024-07-08T05:54:51.123Z] 05:54:51     INFO - PID 22699 | 1720418091121	webdriver::server	DEBUG	<- 200 OK {"value":null}
[task 2024-07-08T05:54:51.125Z] 05:54:51     INFO - STDOUT: ==================================== ERRORS ====================================
[task 2024-07-08T05:54:51.126Z] 05:54:51     INFO - STDOUT: ________ ERROR at setup of test_accept_and_notify[capabilities0-False] _________
[task 2024-07-08T05:54:51.129Z] 05:54:51     INFO - STDOUT: event_loop = <_UnixSelectorEventLoop running=False closed=False debug=False>
[task 2024-07-08T05:54:51.129Z] 05:54:51     INFO - STDOUT: request = <SubRequest 'bidi_session' for <Function test_accept_and_notify[capabilities0-False]>>
[task 2024-07-08T05:54:51.129Z] 05:54:51     INFO - STDOUT: kwargs = {'capabilities': {'unhandledPromptBehavior': 'accept and notify'}, 'configuration': {'browser': {'args': ['--profile',...FRONTEND': 'noninteractive', 'DISPLAY': ':0', 'ENABLE_E10S': 'false', ...}}}, 'host': '127.0.0.1', 'port': 44619, ...}}
[task 2024-07-08T05:54:51.129Z] 05:54:51     INFO - STDOUT: setup = <function _wrap_asyncgen.<locals>._asyncgen_fixture_wrapper.<locals>.setup at 0x7f6228a0fa60>
[task 2024-07-08T05:54:51.129Z] 05:54:51     INFO - STDOUT: finalizer = <function _wrap_asyncgen.<locals>._asyncgen_fixture_wrapper.<locals>.finalizer at 0x7f62280f4f70>
[task 2024-07-08T05:54:51.131Z] 05:54:51     INFO - STDOUT:     @functools.wraps(func)
[task 2024-07-08T05:54:51.131Z] 05:54:51     INFO - STDOUT:     def _asyncgen_fixture_wrapper(
[task 2024-07-08T05:54:51.132Z] 05:54:51     INFO - STDOUT:         event_loop: asyncio.AbstractEventLoop, request: SubRequest, **kwargs: Any
[task 2024-07-08T05:54:51.132Z] 05:54:51     INFO - STDOUT:     ) -> _R:
[task 2024-07-08T05:54:51.135Z] 05:54:51     INFO - STDOUT:         gen_obj = func(**_add_kwargs(func, kwargs, event_loop, request))
[task 2024-07-08T05:54:51.135Z] 05:54:51     INFO - STDOUT:     
[task 2024-07-08T05:54:51.135Z] 05:54:51     INFO - STDOUT:         async def setup() -> _R:
[task 2024-07-08T05:54:51.136Z] 05:54:51     INFO - STDOUT:             res = await gen_obj.__anext__()
[task 2024-07-08T05:54:51.136Z] 05:54:51     INFO - STDOUT:             return res
[task 2024-07-08T05:54:51.136Z] 05:54:51     INFO - STDOUT:     
[task 2024-07-08T05:54:51.136Z] 05:54:51     INFO - STDOUT:         def finalizer() -> None:
[task 2024-07-08T05:54:51.136Z] 05:54:51     INFO - STDOUT:             """Yield again, to finalize."""
[task 2024-07-08T05:54:51.188Z] 05:54:51     INFO - 
[task 2024-07-08T05:54:51.188Z] 05:54:51     INFO - TEST-PASS | /webdriver/tests/interop/beforeunload_prompt.py | test_accept[capabilities0-False] 
[task 2024-07-08T05:54:51.188Z] 05:54:51     INFO - TEST-PASS | /webdriver/tests/interop/beforeunload_prompt.py | test_accept[capabilities0-True] 
[task 2024-07-08T05:54:51.188Z] 05:54:51     INFO - TEST-UNEXPECTED-ERROR | /webdriver/tests/interop/beforeunload_prompt.py | test_accept_and_notify[capabilities0-False] - setup error: webdriver.error.UnknownErrorException: unknown error (500): Failed to decode response from marionette
[task 2024-07-08T05:54:51.189Z] 05:54:51     INFO - event_loop = <_UnixSelectorEventLoop running=False closed=False debug=False>
[task 2024-07-08T05:54:51.189Z] 05:54:51     INFO - request = <SubRequest 'bidi_session' for <Function test_accept_and_notify[capabilities0-False]>>
[task 2024-07-08T05:54:51.189Z] 05:54:51     INFO - kwargs = {'capabilities': {'unhandledPromptBehavior': 'accept and notify'}, 'configuration': {'browser': {'args': ['--profile',...FRONTEND': 'noninteractive', 'DISPLAY': ':0', 'ENABLE_E10S': 'false', ...}}}, 'host': '127.0.0.1', 'port': 44619, ...}}
[task 2024-07-08T05:54:51.189Z] 05:54:51     INFO - setup = <function _wrap_asyncgen.<locals>._asyncgen_fixture_wrapper.<locals>.setup at 0x7f6228a0fa60>
[task 2024-07-08T05:54:51.190Z] 05:54:51     INFO - finalizer = <function _wrap_asyncgen.<locals>._asyncgen_fixture_wrapper.<locals>.finalizer at 0x7f62280f4f70>
[task 2024-07-08T05:54:51.190Z] 05:54:51     INFO - 
[task 2024-07-08T05:54:51.191Z] 05:54:51     INFO -     @functools.wraps(func)
[task 2024-07-08T05:54:51.191Z] 05:54:51     INFO -     def _asyncgen_fixture_wrapper(
[task 2024-07-08T05:54:51.191Z] 05:54:51     INFO -         event_loop: asyncio.AbstractEventLoop, request: SubRequest, **kwargs: Any
[task 2024-07-08T05:54:51.192Z] 05:54:51     INFO -     ) -> _R:
[task 2024-07-08T05:54:51.192Z] 05:54:51     INFO -         gen_obj = func(**_add_kwargs(func, kwargs, event_loop, request))
[task 2024-07-08T05:54:51.193Z] 05:54:51     INFO -     
[task 2024-07-08T05:54:51.193Z] 05:54:51     INFO -         async def setup() -> _R:
[task 2024-07-08T05:54:51.193Z] 05:54:51     INFO -             res = await gen_obj.__anext__()
[task 2024-07-08T05:54:51.193Z] 05:54:51     INFO -             return res
[task 2024-07-08T05:54:51.194Z] 05:54:51     INFO -     
[task 2024-07-08T05:54:51.194Z] 05:54:51     INFO -         def finalizer() -> None:
[task 2024-07-08T05:54:51.194Z] 05:54:51     INFO -             """Yield again, to finalize."""
[task 2024-07-08T05:54:51.194Z] 05:54:51     INFO -     
[task 2024-07-08T05:54:51.195Z] 05:54:51     INFO -             async def async_finalizer() -> None:
[task 2024-07-08T05:54:51.195Z] 05:54:51     INFO -                 try:
[task 2024-07-08T05:54:51.195Z] 05:54:51     INFO -                     await gen_obj.__anext__()
[task 2024-07-08T05:54:51.195Z] 05:54:51     INFO -                 except StopAsyncIteration:
[task 2024-07-08T05:54:51.195Z] 05:54:51     INFO -                     pass
[task 2024-07-08T05:54:51.195Z] 05:54:51     INFO -                 else:
[task 2024-07-08T05:54:51.195Z] 05:54:51     INFO -                     msg = "Async generator fixture didn't stop."
[task 2024-07-08T05:54:51.195Z] 05:54:51     INFO -                     msg += "Yield only once."
[task 2024-07-08T05:54:51.195Z] 05:54:51     INFO -                     raise ValueError(msg)
[task 2024-07-08T05:54:51.196Z] 05:54:51     INFO -     
[task 2024-07-08T05:54:51.196Z] 05:54:51     INFO -             event_loop.run_until_complete(async_finalizer())
[task 2024-07-08T05:54:51.196Z] 05:54:51     INFO -     
[task 2024-07-08T05:54:51.196Z] 05:54:51     INFO - >       result = event_loop.run_until_complete(setup())

The underlying failure is a potential crash during the shutdown of Firefox:
https://treeherder.mozilla.org/logviewer?job_id=465407101&repo=autoland&lineNumber=194369-194384

[task 2024-07-08T05:54:08.749Z] 05:54:08     INFO - PID 22699 | 1720418048748	Marionette	DEBUG	0 -> [0,46,"Marionette:Quit",{"flags":["eForceQuit"]}]
[task 2024-07-08T05:54:08.750Z] 05:54:08     INFO - PID 22699 | 1720418048750	Marionette	INFO	Stopped listening on port 35322
[task 2024-07-08T05:54:08.781Z] 05:54:08     INFO - PID 22699 | [Child 23092, IPC I/O Child] WARNING: [DD8138FB422205F3.96665EDE0126D631]: Dropping message '<null>'; no connection to unknown peer 1.1: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:365
[task 2024-07-08T05:54:08.782Z] 05:54:08     INFO - PID 22699 | [Child 23069, IPC I/O Child] WARNING: [F56011DAFCE85B03.C674D9DC131A134C]: Dropping message '<null>'; no connection to unknown peer 1.1: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:365
[task 2024-07-08T05:54:08.784Z] 05:54:08     INFO - PID 22699 | [Child 23022, IPC I/O Child] WARNING: [E35855B34FA21CFD.9E9B186565B45410]: Dropping message '<null>'; no connection to unknown peer 1.1: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:365
[task 2024-07-08T05:54:08.785Z] 05:54:08     INFO - PID 22699 | [Utility 22955, IPC I/O Child] WARNING: [B921783F44010B5B.247B520DBFA9C8CB]: Dropping message '<null>'; no connection to unknown peer 1.1: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:365
[task 2024-07-08T05:54:08.789Z] 05:54:08     INFO - PID 22699 | Exiting due to channel error.
[task 2024-07-08T05:54:08.790Z] 05:54:08     INFO - PID 22699 | Exiting due to channel error.
[task 2024-07-08T05:54:08.791Z] 05:54:08     INFO - PID 22699 | Exiting due to channel error.
[task 2024-07-08T05:54:08.791Z] 05:54:08     INFO - PID 22699 | [Utility 22955, Main Thread] WARNING: Shutting down Utility process early due to a crash!: file /builds/worker/checkouts/gecko/ipc/glue/UtilityProcessChild.cpp:358
[task 2024-07-08T05:54:08.792Z] 05:54:08     INFO - PID 22699 | [Child 22809, IPC I/O Child] WARNING: [E23DCDDCE61710D3.194247E91B21DA52]: Dropping message '<null>'; no connection to unknown peer 1.1: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:365
[task 2024-07-08T05:54:08.792Z] 05:54:08     INFO - PID 22699 | Exiting due to channel error.
[task 2024-07-08T05:54:08.794Z] 05:54:08     INFO - PID 22699 | [Child 22915, IPC I/O Child] WARNING: [F87F755AC18D5FFE.7B9947573577C9BB]: Dropping message '<null>'; no connection to unknown peer 1.1: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:365
[task 2024-07-08T05:54:08.794Z] 05:54:08     INFO - PID 22699 | Exiting due to channel error.
[task 2024-07-08T05:54:08.794Z] 05:54:08     INFO - PID 22699 | [Child 22843, IPC I/O Child] WARNING: [96FE4553C6936EB4.60451CFC0AA73145]: Dropping message '<null>'; no connection to unknown peer 1.1: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:365
[task 2024-07-08T05:54:08.794Z] 05:54:08     INFO - PID 22699 | Exiting due to channel error.
[task 2024-07-08T05:54:08.794Z] 05:54:08     INFO - PID 22699 | [23008] Sandbox: Unexpected EOF, op 0 flags 00 path /proc/cpuinfo
[task 2024-07-08T05:54:08.795Z] 05:54:08     INFO - PID 22699 | 1720418048786	webdriver::server	DEBUG	Teardown session
[task 2024-07-08T05:54:08.795Z] 05:54:08     INFO - PID 22699 | 1720418048786	geckodriver::marionette	ERROR	Failed to close browser connection: unknown error

Could this be related to the sandbox unexpected EOF output?

Flags: needinfo?(nika)

The log line I assume you're mentioning as the "potential crash" (Shutting down Utility process early due to a crash!) is just sent if there's an unexpected connection error from the UtilityProcessChild to the main process (https://searchfox.org/mozilla-central/rev/54a5c4f14f3eb514a29e0cebcb5a095144bcd450/ipc/glue/UtilityProcessChild.cpp#358). I think the wording of the crash is probably a bit overly scary-sounding, but it seems like the parent process is exiting without having cleaned up the child processes completely, and then the child processes are shutting themselves down.

I'm skeptical that the python exception is caused by a child process sandbox error after the parent process has gone away.

Flags: needinfo?(nika)

Thanks Nika. Given that we saw this kind of unexpected shutdown behavior already multiple times in the past I wonder if that is something we can just ignore or if we should get a separate bug filed to get that fixed. I assume the parent should wait until all the content processes have shutdown on their own?

Flags: needinfo?(nika)

Yeah, in normal operation, the parent process shouldn't exit until all child processes have fully shut down, which is why this is a bit of an abnormal situation, suggesting the parent process was killed or similar.

Flags: needinfo?(nika)

Thanks Nika. I filed bug 1908293 to track this on a platform level.

Summary: Intermittent Wd | setup error: webdriver.error.UnknownErrorException: unknown error (500): Failed to decode response from marionette → Intermittent Wd | setup error: webdriver.error.UnknownErrorException: unknown error (500): Failed to decode response from marionette (Exiting due to channel error)
Summary: Intermittent Wd | setup error: webdriver.error.UnknownErrorException: unknown error (500): Failed to decode response from marionette (Exiting due to channel error) → Intermittent Mn, Wd | setup error: webdriver.error.UnknownErrorException: unknown error (500): Failed to decode response from marionette (Exiting due to channel error)
No longer blocks: 1894213

In bug 1894213 comment 41, Whimboo recommends we make this bug specific to Android as a regression from his work on bug 1920573. With the test expectation updates in bug 1894213 we could mark this bug as fixed.

Keywords: regression
OS: Unspecified → Android
Regressed by: 1920573
Summary: Intermittent Mn, Wd | setup error: webdriver.error.UnknownErrorException: unknown error (500): Failed to decode response from marionette (Exiting due to channel error) → Android Intermittent Mn, Wd | setup error: webdriver.error.UnknownErrorException: unknown error (500): Failed to decode response from marionette (Exiting due to channel error)
Depends on: 1894213
See Also: 1894213

Set release status flags based on info from the regressing bug 1920573

(In reply to Chris Peterson [:cpeterson] from comment #12)

In bug 1894213 comment 41, Whimboo recommends we make this bug specific to Android as a regression from his work on bug 1920573. With the test expectation updates in bug 1894213 we could mark this bug as fixed.

Well, meanwhile bug 1894213 is about the Android specific regression. This one tracks other platforms. I explicitly left bug 1894213 open for the remaining work. As such I'm restoring the former bug's state.

No longer depends on: 1894213
Keywords: regression
OS: Android → All
No longer regressed by: 1920573
See Also: → 1894213
Summary: Android Intermittent Mn, Wd | setup error: webdriver.error.UnknownErrorException: unknown error (500): Failed to decode response from marionette (Exiting due to channel error) → Intermittent Mn, Wd | setup error: webdriver.error.UnknownErrorException: unknown error (500): Failed to decode response from marionette (Exiting due to channel error)
Duplicate of this bug: 1922862

In some cases this is triggered by having too many open files. That issue is covered by bug 1918983.

Depends on: 1918983
You need to log in before you can comment on or make changes to this bug.