Crash when running with WGPU_TRACE environment
Categories
(Core :: Graphics: WebGPU, defect, P2)
Tracking
()
People
(Reporter: kvark, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
Having WGPU_TRACE
working is essential to debugging issues. It appears to be crashing now, rather mysteriously, on a specific page (attached). Logs are also attached.
Trying to catch it in either the child, or the GPU process, doesn't yield anything. The crash seems to originate from the IPC thread, and the first relevant log messages are:
[Parent 664563, IPC I/O Parent] WARNING: Message needs unreceived descriptors channel:7f1e574d8d00 message-type:65531 header()->num_fds:1 num_fds:0 fds_i:0: file /mnt/code/firefox/_webgpu/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:507
###!!! [Child][MessageChannel] Error: (msgtype=0xB40001,name=PWebGPU::Msg_DeviceAction) Channel error: cannot send/recv
[2021-04-10T17:30:56Z INFO wgpu_core::device] Created buffer Valid((2173, 1, Vulkan)) with BufferDescriptor { label: None, size: 576, usage: VERTEX, mapped_at_creation: true }
[Child 664650, IPC I/O Child] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /mnt/code/firefox/_webgpu/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc:19
Reporter | ||
Comment 1•4 years ago
|
||
attaching the test case
Reporter | ||
Comment 2•4 years ago
|
||
I used this function to decode the message type 65531 = SHMEM_CREATED_MESSAGE
.
Reporter | ||
Comment 3•4 years ago
|
||
This isn't a crash, it seems. Adding an assertion there gives me a proper call stack:
Assertion failure: false, at /mnt/code/firefox/_webgpu/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:498
#01: IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int) (/mnt/code/firefox/_webgpu/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:828)
#02: base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) (/mnt/code/firefox/_webgpu/ipc/chromium/src/base/message_pump_libevent.cc:251)
#03: event_process_active_single_queue (/mnt/code/firefox/_webgpu/ipc/chromium/src/third_party/libevent/event.c:1639)
#04: event_base_loop (/mnt/code/firefox/_webgpu/ipc/chromium/src/third_party/libevent/event.c:1961)
#05: base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) (/mnt/code/firefox/_webgpu/ipc/chromium/src/base/message_pump_libevent.cc:0)
#06: MessageLoop::RunInternal() (/mnt/code/firefox/_webgpu/ipc/chromium/src/base/message_loop.cc:0)
#07: MessageLoop::Run() (/mnt/code/firefox/_webgpu/ipc/chromium/src/base/message_loop.cc:311)
#08: base::Thread::ThreadMain() (/mnt/code/firefox/_webgpu/ipc/chromium/src/base/thread.cc:194)
#09: ThreadFunc(void*) (/mnt/code/firefox/_webgpu/ipc/chromium/src/base/platform_thread_posix.cc:41)
Reporter | ||
Comment 4•4 years ago
|
||
Here is a few more details of the story:
- the application creates 3000 * N resources at once, many end up with an associated Shmem on our side
- we aren't freeing them nearly as fast as we are asked to create them
- when
WGPU_TRACE
is enabled, we are also writing down many files in the GPU/parent process. ~5000 of them. All of them are written withstd::fs
in Rust, and all of these handles are closed.
Perhaps, we are running into some kind of file descriptor exhaustion?
Comment 5•4 years ago
|
||
(In reply to Dzmitry Malyshau [:kvark] from comment #4)
Here is a few more details of the story:
- the application creates 3000 * N resources at once, many end up with an associated Shmem on our side
- we aren't freeing them nearly as fast as we are asked to create them
- when
WGPU_TRACE
is enabled, we are also writing down many files in the GPU/parent process. ~5000 of them. All of them are written withstd::fs
in Rust, and all of these handles are closed.Perhaps, we are running into some kind of file descriptor exhaustion?
That is likely to be the problem. We have a limit of 4096 file descriptors per process, and the last time this was investigated, we found that some Linux distros have a hard limit of 4k so we can't raise it further in that case.
This probably would have been more immediately obvious if not for a lack of checks for fd exhaustion in some relevant places.
Shmem
will automatically close its file descriptor after it's been mapped in each of the two processes involved, but if a large number are created at once, we could have a large peak number of fds. (Note that Linux also has a limit of 64k virtual memory areas due to annoying historical issues, and bug 1700687 reveals that we're already reaching ⅓ of that limit.)
It's going to be necessary to merge these resources into fewer shared memory segments somehow.
Reporter | ||
Comment 6•3 years ago
|
||
I no longer think this should be blocking MVP. It doesn't affect users.
Updated•2 years ago
|
Updated•2 years ago
|
Updated•11 months ago
|
Updated•11 months ago
|
Description
•