ABORT: failed to re-open freezeable shm: Too many open files: file /builds/worker/checkouts/gecko/ipc/chromium/src/base/shared_memory_posix.cc:345)
Categories
(Core :: IPC, defect)
Tracking
()
People
(Reporter: whimboo, Unassigned)
References
(Blocks 1 open bug)
Details
Seen in a webdriver web-platform test for Firefox 128 when triggering a navigation to https://web-platform.test:8443/webdriver/tests/bidi/network/support/empty.html :
[task 2024-09-05T13:07:47.613Z] 13:07:47 INFO - PID 1602 | [Parent 1625, Main Thread] ###!!! ABORT: failed to re-open freezeable shm: Too many open files: file /builds/worker/checkouts/gecko/ipc/chromium/src/base/shared_memory_posix.cc:345
[task 2024-09-05T13:07:47.613Z] 13:07:47 INFO - STDOUT: Initializing stack-fixing for the first stack frame, this may take a while...
[task 2024-09-05T13:08:11.273Z] 13:08:11 INFO - PID 1602 | #01: NS_DebugBreak [xpcom/base/nsDebugImpl.cpp:469]
[task 2024-09-05T13:08:11.273Z] 13:08:11 INFO - PID 1602 | #02: base::SharedMemory::CreateInternal(unsigned long, bool) [ipc/chromium/src/base/shared_memory_posix.cc:0]
[task 2024-09-05T13:08:11.274Z] 13:08:11 INFO - PID 1602 | #03: mozilla::ipc::MemMapSnapshot::Init(unsigned long) [dom/ipc/MemMapSnapshot.cpp:19]
[task 2024-09-05T13:08:11.275Z] 13:08:11 INFO - PID 1602 | #04: mozilla::dom::ipc::WritableSharedMap::Serialize() [dom/ipc/SharedMap.cpp:307]
[task 2024-09-05T13:08:11.275Z] 13:08:11 INFO - PID 1602 | #05: mozilla::dom::ipc::WritableSharedMap::BroadcastChanges() [dom/ipc/SharedMap.cpp:364]
[task 2024-09-05T13:08:11.276Z] 13:08:11 INFO - PID 1602 | #06: mozilla::dom::MozWritableSharedMap_Binding::flush(JSContext*, JS::Handle<JSObject*>, void*, JSJitMethodCallArgs const&) [s3:gecko-generated-sources:dd0ac67eb9a51c28b612af09a8cd4f73062fd999b56abf09c458191f32d43a6adaaed15f1b85827b297020094a288a1601a4fcc0ccbaad657920e4f7add05328/dom/bindings/MozSharedMapBinding.cpp::1538]
[task 2024-09-05T13:08:11.277Z] 13:08:11 INFO - PID 1602 | #07: mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy, mozilla::dom::binding_detail::ThrowExceptions>(JSContext*, unsigned int, JS::Value*) [dom/bindings/BindingUtils.cpp:3270]
Reporter | ||
Comment 1•11 months ago
|
||
Given that this causes a hang in Firefox I wonder if it might be related to bug 1832294 where we see similar hangs once in a while when navigating via the WebDriver BiDi browsingContext.navigate
command.
Updated•11 months ago
|
Comment 2•11 months ago
|
||
Hmm, though, not sure if this is really about IPC, but bug 1463587 was there.
Comment 3•11 months ago
|
||
This is an exception raised due to file descriptor exhaustion, which does line up with the other Failed to duplicate file handle for current process!
errors before and after this line.
It sounds like there is a chance that we have some kind of file descriptor leak in that test case if this is happening reliably, which would be interesting to isolate and figure out the cause of. Given we don't see similar logs in other hangs, it seems unlikely that the other timeouts are due to FD exhaustion.
These exact logs are likely to not show up on Linux, as we use a slightly different shared memory backend there, though, so perhaps the errors could end up looking slightly different?
Updated•11 months ago
|
Reporter | ||
Comment 4•11 months ago
|
||
So maybe it's related to https://github.com/web-platform-tests/wpt/issues/27072. For me it's not clear where exactly in the stack the origin of the file handle exhaustion is located. Maybe it's wptserve given that this is the tool that runs all the time when executing web-platform tests.
For me it's kinda easy to reproduce and I mentioned steps here, which basically is to run this mach command and let it run for a while (assuming the test is not failing due to another assertion):
mach wpt --webdriver-binary=target/debug/geckodriver --webdriver-arg=-vv testing/web-platform/tests/webdriver/tests/switch_to_frame/switch.py --repeat-until-unexpected
I would appreciate some feedback in how to figure our out where exactly we actually leak file handles.
Reporter | ||
Updated•11 months ago
|
Reporter | ||
Comment 5•10 months ago
|
||
Jed, if you could give some advice that would be great. I see quite a lot of these failures for Wd jobs in CI. Thanks.
Comment 6•10 months ago
|
||
I just ran lsof
on my Mac, and the firefox
process has a lot of file descriptors listed with type rte
, which the man page says are AF_ROUTE
sockets. I wouldn't expect us to need more than one, let alone >400 of them, so maybe Necko has a leak? Indeed, it looks like the socket opened here (thanks, searchfox) is never closed. I'll file a bug.
Also, I don't think the wpt github issue is related — that's reporting fd exhaustion in a Python process that runs Firefox, and it's the error code for exceeding the per-process limit rather than the systemwide limit (so excessive fd use in Firefox wouldn't contribute to it).
Reporter | ||
Comment 7•9 months ago
|
||
I haven't seen this issue anymore since bug 1925667 got fixed. Lets close as WFM so that we can easily reopen when it happens again. Thanks Jed!
Description
•